Principal Researcher and Research Manager at Microsoft Research (MSR), Redmond, USA. My research is in computer vision, speech signal processing, multi-sensory fusion, multimedia computing, real-time collaboration, and human-machine interaction.

I manage the Multimedia, Interaction, and eXperiences (MIX) Group. I was previously affiliated with the Communication and Collaboration Systems Group,and the Speech Technology Group.

My research interests include:

  • Computer vision and graphics: calibration, matching, stereo, motion, 3D modeling, 3D display
  • Audio processing and rendering, speech processing, spatial audio, multichannel AEC
  • Audio-visual fusion, active object detection and tracking
  • Multimedia, human-computer interaction, human-human communication and collaboration
  • Biology-inspired learning, autonomous mental development
  • Human information processing: face/speaker recognition/verification, activity recognition and understanding


Emotion Recognition for MS Cognitive Services

Established: July 1, 2015

Emotion Recognition takes an image with faces as an input, and returns the confidence across a set of emotions for each face in the image, as well as bounding box for the face (using MS Face API). The algorithm infers emotions from appearance using a custom Deep Convolution Network. To improve labeling, we labeled each image with more than 10 taggers using crowd source, which allowed us to learn probability distribution for each image. We are also sharing…

Eye-Gaze Tracking for Improved Natural User Interaction

Established: April 12, 2014

We develop novel eye-gaze tracking technologies in order to make eye-gaze tracking technology ubiquitously available for improved natural user interaction (NUI).   In particular, we investigate two approaches: Active IR lighting: We investigate the possibility of using multiple IR lights around a display. The key idea is to use many (more than 4) IR lights distributed around a monitor border. RGB + Depth: We leverage both RGB camera and depth sensor already available in Kinect…

ViiBoard: Vision-enhanced Immersive Interaction with Touch Board

Established: April 11, 2014

ViiBoard uses vision techniques to significantly enhance the user experience on large touch displays (e.g. Microsoft Perceptive Pixel) in two areas: human computer interaction and immersive remote collaboration. Simple Setup ViiBoard uses only an RGBD camera (Microsoft Kinect), mounted on the side of a large touch display, to enhance user interaction and enable 3D immersive collaboration in a desirable form factor, practical for home or office use. Part I: Vision-enhanced Interaction ViiBoard augments the touch…

Mobile Surface

Established: March 1, 2010

It is a novel interaction system for mobile computing. Our goal is to bring Microsoft Surface experience to mobile scenarios, and more importantly, to enable 3D interaction with mobile devices. We do research on how to transform any surface (e.g., a coffee table or a piece of paper) to Mobile Surface with a mobile device and a camera-projector system. Besides this, our work also includes how to get 3D object model in real-time, augmented reality…

Audio Spatialization and Multichannel Acoustic Echo Cancellation (AEC)

Established: August 9, 2008

About the System In multiparty conferencing, one hears voices of more than one remote participants. Current commercial systems mix them into a single mono audio stream, and thus all voices of remote participants will sound like coming from the same location when using loudspeakers or from inside the listener's head when using headphones. This is in sharp contrast to what happens in real life where each voice has its distinct location. We have built and…

Personal Telepresence Station

Established: July 29, 2008

With globalization and workforce mobility, there is a strong need of research and development of advanced infrastructures and tools to bring immersive experience into teleconferencing so people across geographically distributed sites can interact collaboratively. The Personal Telepresence Station project aims at bringing Telepresence experience to offices. We try to replicate the same experience people enjoy in face-to-face meetings such as gaze awareness and spatial audio.

Microphone Array Audio Spatialization

Established: May 31, 2008

Enhancing Stereophonic Conferencing with Microphone Arrays Through Sound Field Warping and Audio Spatialization Audio samples mono.wma: Traditional mono output from a microphone array. speaker.wma: Array audio spatialized over a pair of loudspeakers. headphone.wma: Array audio spatialized over headphones.

Speaker Verification: Text-Dependent vs. Text-Independent

Established: August 20, 2006

Speaker verification is the process of verifying the claimed identity of a speaker based on the speech signal from the speaker (voiceprint). There are two types of speaker verification systems: Text-Independent Speaker Verification (TI-SV) and Text-Dependent Speaker Verification (TD-SV). TD-SV requires the speaker saying exactly the enrolled or given password. Text independent Speaker Verification is a process of verifying the identity without constraint on the speech content. Compared to TD-SV, it is more convenient because…

Projector-Whiteboard-Camera System for Remote Collaboration

Established: October 2, 2004

Visual Echo Cancellation for Seamless Integration of Remote Sites About In a typical remote collaboration setup, two or more projector-camera pairs are "cross-wired" to form a full-duplex system for two-way communication. A whiteboard can be used as the projector screen, and in that case, the whiteboard serves as an output device as well as an input device. Users can write on the whiteboard to comment on what is projected or to add new thoughts in…

Who Is Talking To You (WITTY)

Established: August 9, 2003

Mission Statement Exploit multi-sensory information to improve user experience in  •  Speech-centric human computer interaction •  Computer-mediated human inter-communication Goals Understand end-users' requirements Identify sensor(s) requirement Prototype new hardware Develop robust technologies Publications Publication 1: Air-and-Bone Conductive Integrated Microphones for Robust Speech Detection and Enhancement in the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU03), November 30 - December 4, 2003. Publication 2: Multi-Sensory Microphones for Robust Speech Detection, Enhancement and Recognition in the IEEE…


Established: August 6, 2002

Digital Technology for Effective Whiteboard Use Introduction Whiteboard is ubiquitous and will exist for foreseeable future, but its content is hard to archive and share. While digital cameras can be used to capture whiteboard content, the image is usually taken from an angle, contains irrelevant information, and has shadows. We have developed an intelligent and automatic technique to reproduce the whiteboard content as a crisp and faithful image which can be archived or shared with…

Eye-Gaze Correction for Video Telecommunications

Established: May 1, 2002

The lack of eye contact in desktop video teleconferencing substantially reduces the effectiveness of video contents. While expensive and bulky hardware is available on the market to correct eye gaze, researchers have been trying to provide a practical software-based solution to bring video-teleconferencing one step closer to the mass market. This paper presents a novel approach that is based on stereo analysis combined with rich domain knowledge (a personalized face model). This marriage is mutually…

Human Action and Activity Recognition

Established: August 9, 2001

Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures. This paper presents a graphical model for learning and recognizing human actions. Specifically, we propose to encode actions in a weighted directed graph, referred to as action graph, where nodes of the graph represent salient postures that are used to characterize the actions and are shared by all actions. The weight between two nodes measures the transitional probability between the two postures represented by…

Re-rendering from a Sparse Set of Images

Established: June 1, 2001

We present a framework for view-dependent rendering from arbitrary viewpoints and relighting under novel illumination conditions of a real object from a sparse set of images and a pre-acquired geometric model of the object. Using a 3D model and a small set of images of an object, we recover all the necessary photometric information for subsequent rendering. We recover the illumination distribution, represented as a hemisphere covering the object, as well as the parameters of…

Face Modeling

Established: May 5, 2001

Overview 3d Modeling. Generating realistic 3D human face models and facial animations has been a persistent challenge in computer vision and graphics. We have developed a system that constructs textured 3D face models from videos with minimal user interaction. Our system takes a video sequence of a face with an ordinary video camera. After five manual clicks on two images to tell the system where the eye corners, nose tip and mouth corners are, the…


Established: November 15, 2000

Transforming an ordinary paper into a wireless mobile input device. Virtual mouse, keyboard and 3D controller with an ordinary piece of paper. Abstract In many intelligent environments, instead of using conventional mice, keyboards and joysticks, people are looking for an intuitive, immersive and cost-efficient interaction device. We are developing a vision-based gesture interface prototype system, VisualPanel, which employs an arbitrary quadrangle-shaped panel (e.g., an ordinary paper) and a tip pointer (e.g., fingertip) as an intuitive,…


Established: November 5, 2000

Transforming an ordinary screen into a Touch Screen with a camera. About Touch screens are very convenient because one can directly point to where it is interesting. This paper presents an inexpensive technique to transform an ordinary screen into a touch screen using an ordinary camera. The setup is easy: position a camera so it can see the whole screen. The system calibration involves the detection of the screen region in the image, which determines…

A Flexible New Technique for Camera Calibration

Established: December 2, 1999

We propose a flexible new technique to easily calibrate a camera. It is well suited for use without specialized knowledge of 3D geometry or computer vision. The technique only requires the camera to observe a planar pattern shown at a few (at least two) different orientations. Either the camera or the planar pattern can be freely moved. The motion need not be known. Radial lens distortion is modeled. The proposed procedure consists of a closed-form…





























Link description

Computer Vision and Intelligent Services


July 28, 2015


Baining Guo, Martial Hebert, Tao Mei, and Zhengyou Zhang


Microsoft Research, Carnegie Mellon University, Microsoft



Short Bio

Full version of his résumé is available by clicking here.

Zhengyou Zhang is a Fellow of the Institute of Electrical and Electronic Engineers (IEEE) (2005, for contributions to robust computer vision techniques) and a Fellow of the Association of Computing Machinery (ACM) (2013, for contributions to computer vision and multimedia). He is the Founding Editor-in-Chief of the newly established IEEE Transactions on Autonomous Mental Development (IEEE T-AMD), and is on the Editorial Board of the International Journal of Computer Vision (IJCV), the Machine Vision and Applications, and the Journal of Computer Science and Technology (JCST). He was on the Editorial Board of the IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE T-PAMI) from 1999 to 2005, the IEEE Transactions on Multimedia (IEEE T-MM) from 2004 to 2009, the International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI) from 1997 to 2008, among others. He is listed in Who’s Who in the World, Who’s Who in America and Who’s Who in Science and Engineering.

Before joining Microsoft, Zhengyou worked at INRIA (French National Institute for Research in Computer Science and Control) for 11 years, and was a Senior Research Scientist since 1991, where he worked in the Computer Vision and Robotics group. In 1996-1997, he spent one-year sabbatical as an Invited Researcher at the Advanced Telecommunications Research Institute International (ATR), Kyoto, Japan.

He holds more than 130 US patents and has about 20 patents pending. He also holds a few Japanese patents for his inventions during his sabbatical at ATR.

He has published over 200 papers in refereed international journals and conferences, and is the author of the following books

He has edited multiple books, including

He is a General Co-Chair of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), June 2017, Puerto Rico, USA.

He is a General Co-Chair of the ACM International Conference on Multimodal Interaction (ICMI 2015), Nov. 3-9, 2015, Seattle, USA.

He was a General Co-Chair of the International Workshop on Multimedia Signal Processing (MMSP 2014), September 22-24, 2014, Jakarta, Indonesia.

He was a General Co-Chair of the International Workshop on Multimedia Signal Processing (MMSP 2011), October 17-19, 2011, Hangzhou, China.

He was the Chair of the new Technical Briefs program of the SIGGRAPH Asia, Singapore, November 28 – December 1, 2012.

He was a Program Co-Chair of the International Conference on Multimedia and Expo (ICME), July 2010, a Program Co-Chair of the ACM International Conference on Multimedia (ACM MM), October 2010, and a Program Co-Chair of the ACM International Conference on Multimodal Interfaces (ICMI), November 2010. He was the Program Co-Chair of the 8th International Conference on Development and Learning (ICDL09), June 5-7, 2009, Shanghai, China. He was a Technical Co-Chair of the International Workshop on Multimedia Signal Processing (MMSP06), October 3-6, 2006, Victoria, BC, Canada. He was the Program Co-Chair of the Asian Conference on Computer Vision (ACCV2004), Jan. 27-30, 2004, Jeju Island, Korea; a Demo Chair and an Area Chair of the International Conference on Computer Vision (ICCV2003), Oct. 14-17, 2003, Nice, France; the Demo Chair of the International Conference on Computer Vision (ICCV2005), Oct. 15-21, 2005, Beijing, China. He co-organized the International Workshop on Multimedia Technologies in E-Learning and Collaboration, held in Nice, France, on October 17, 2003. He served on the Program Committees of ICCV, CVPR, ECCV, ACCV and many other international conferences and workshops.

He was a co-organizer of the First International Workshop on Human Activity Understanding from 3D Data (HAU3D) 2011, in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Colorado Springs, June 20-25, 2011; a co-organizer of the Second International Workshop on Human Activity Understanding from 3D Data (HAU3D) 2012, in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, June 16-21, 2012; a co-organizer of the Third International Workshop on Human Activity Understanding from 3D Data (HAU3D) 2013, in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon, June 25-271, 2013.

Zhengyou Zhang is a member of the IEEE Computer Society Fellows Committee from 2005 to 2007, and in 2010 and 2011, a member of IEEE Technical Committee on Multimedia Signal Processing (2006-2010) and the ex-Chair of IEEE Technical Committee on Autonomous Mental Development (2007-2009).

Interview by the Computational Intelligence Magazine is available here (or go to IEEE Xplore).

Interview by the IEEE Signal Processing Magazine on “Telepresence: Virtual Reality in the Real World” (or go to IEEE Xplore). (November 2011)

Natural User Interfaces: What’s Next?. Video on 3D Photorealistic Talking Head.(Februray 2011)



Published in Image and Vision Computing Journal, Vol.15, No.1, pages 59-76, 1997.

Collaborators, Post-Doctoral Researchers and Students

  • Zicheng Liu (Researcher, MSR)
  • Mike Sinclair (Principal Researcher, MSR)
  • Li-wei He (Research Engineer, MSR)
  • Cha Zhang (Researcher, MSR)
  • Rajesh Hegde (Research Engineer, MSR)
  • Dinei Florencio (Researcher, MSR)
  • Qin Cai (Research Engineer, MSR)
  • Wei-ge Chen (Software Architect, MSR)
  • Phil Chou (Principal Researcher, MSR)
  • Ying Shan (Post-Doc, now Scientist at Microsoft Online)
  • Gang Hua (Scientist at Nokia Research)
  • Ming-Ting Sun (Professor, University of Washington)
  • Wanqing Li (Associate Professor, University of Wollongong)
  • Chunhui Zhang (Researcher, MSR Asia, now at Alibaba)
  • John Hershey (Post-Doc, now at IBM Research)

Interns: Sasa Junuzovic (2008), Matt Luciw (2008), Aswin Sankaranarayanan (2008), Xiaogang Wang (2008), Qing Zhang (2008), Raffay Hamid (2007), Sasa Junuzovic (2007), Miao Liao (2007), Mingxuan Sun (2007), Qi Zhao (2007), Amar Subramanya (2006), Sasa Junuzovic (2006), Ming Liu (2005), Gang Hua (2005), Amar Subramanya (2004), Ya Chang (2004), Yanli Zheng (2003), Hanning Zhou (2003), Guodong Guo (2002), Ruigang Yang (2001), Ying Wu (2000), Ko Nishino (1999, 2000), Qifa ke (1998)

Supervision of researchers when I was at INRIA: Nassir Navab (Ph.D., 1993), Michel Buffa (Ph.D., 1993), Gabriella Csurka (Ph.D., 1996), Bernard Hotz (Research Engineer, 1991-1994), Serge Saracco (Master, 1993), Jean-Francois Ponthieux (Master, 1993), Veit Schenk (Master, 1996), Laurence Lucido (Ph.D., 1997), Sylvain Bougnoux (Ph.D., 1998).