Principal Researcher and Research Manager at Microsoft Research (MSR), Redmond, USA. My research is in computer vision, speech signal processing, multi-sensory fusion, multimedia computing, real-time collaboration, and human-machine interaction.

I manage the Multimedia, Interaction, and eXperiences (MIX) Group. I was previously affiliated with the Communication and Collaboration Systems Group,and the Speech Technology Group.

My research interests include:

  • Computer vision and graphics: calibration, matching, stereo, motion, 3D modeling, 3D display
  • Audio processing and rendering, speech processing, spatial audio, multichannel AEC
  • Audio-visual fusion, active object detection and tracking
  • Multimedia, human-computer interaction, human-human communication and collaboration
  • Biology-inspired learning, autonomous mental development
  • Human information processing: face/speaker recognition/verification, activity recognition and understanding


Who Is Talking To You (WITTY)

Established: August 9, 2003

Mission Statement Exploit multi-sensory information to improve user experience in  •  Speech-centric human computer interaction •  Computer-mediated human inter-communication Goals Understand end-users' requirements Identify sensor(s) requirement Prototype new hardware Develop robust technologies Publications Publication 1: Air-and-Bone Conductive Integrated Microphones for…

Projector-Whiteboard-Camera System for Remote Collaboration

Established: October 2, 2004

Visual Echo Cancellation for Seamless Integration of Remote Sites About In a typical remote collaboration setup, two or more projector-camera pairs are "cross-wired" to form a full-duplex system for two-way communication. A whiteboard can be used as the projector screen,…

Speaker Verification: Text-Dependent vs. Text-Independent

Established: August 20, 2006

Speaker verification is the process of verifying the claimed identity of a speaker based on the speech signal from the speaker (voiceprint). There are two types of speaker verification systems: Text-Independent Speaker Verification (TI-SV) and Text-Dependent Speaker Verification (TD-SV). TD-SV…

Microphone Array Audio Spatialization

Established: May 31, 2008

Enhancing Stereophonic Conferencing with Microphone Arrays Through Sound Field Warping and Audio Spatialization Audio samples mono.wma: Traditional mono output from a microphone array. speaker.wma: Array audio spatialized over a pair of loudspeakers. headphone.wma: Array audio spatialized over headphones.

Audio Spatialization and Multichannel Acoustic Echo Cancellation (AEC)

Established: August 9, 2008

About the System In multiparty conferencing, one hears voices of more than one remote participants. Current commercial systems mix them into a single mono audio stream, and thus all voices of remote participants will sound like coming from the same…

Human Action and Activity Recognition

Established: August 9, 2001

Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures. This paper presents a graphical model for learning and recognizing human actions. Specifically, we propose to encode actions in a weighted directed graph, referred to as action graph, where…


Established: November 5, 2000

Transforming an ordinary screen into a Touch Screen with a camera. About Touch screens are very convenient because one can directly point to where it is interesting. This paper presents an inexpensive technique to transform an ordinary screen into a…


Established: November 15, 2000

Transforming an ordinary paper into a wireless mobile input device. Virtual mouse, keyboard and 3D controller with an ordinary piece of paper. Abstract In many intelligent environments, instead of using conventional mice, keyboards and joysticks, people are looking for an…

Re-rendering from a Sparse Set of Images

Established: June 1, 2001

We present a framework for view-dependent rendering from arbitrary viewpoints and relighting under novel illumination conditions of a real object from a sparse set of images and a pre-acquired geometric model of the object. Using a 3D model and a…

Eye-Gaze Correction for Video Telecommunications

Established: May 1, 2002

The lack of eye contact in desktop video teleconferencing substantially reduces the effectiveness of video contents. While expensive and bulky hardware is available on the market to correct eye gaze, researchers have been trying to provide a practical software-based solution…

A Flexible New Technique for Camera Calibration

Established: December 2, 1999

We propose a flexible new technique to easily calibrate a camera. It is well suited for use without specialized knowledge of 3D geometry or computer vision. The technique only requires the camera to observe a planar pattern shown at a…

Emotion Recognition for MS Cognitive Services

Established: July 1, 2015

Emotion Recognition takes an image with faces as an input, and returns the confidence across a set of emotions for each face in the image, as well as bounding box for the face (using MS Face API). The algorithm infers emotions from appearance…

Eye-Gaze Tracking for Improved Natural User Interaction

Established: April 12, 2014

We develop novel eye-gaze tracking technologies in order to make eye-gaze tracking technology ubiquitously available for improved natural user interaction (NUI).   In particular, we investigate two approaches: Active IR lighting: We investigate the possibility of using multiple IR lights…

ViiBoard: Vision-enhanced Immersive Interaction with Touch Board

Established: April 11, 2014

ViiBoard uses vision techniques to significantly enhance the user experience on large touch displays (e.g. Microsoft Perceptive Pixel) in two areas: human computer interaction and immersive remote collaboration. Simple Setup ViiBoard uses only an RGBD camera (Microsoft Kinect), mounted on…

Mobile Surface

Established: March 1, 2010

It is a novel interaction system for mobile computing. Our goal is to bring Microsoft Surface experience to mobile scenarios, and more importantly, to enable 3D interaction with mobile devices. We do research on how to transform any surface (e.g.,…

Personal Telepresence Station

Established: July 29, 2008

With globalization and workforce mobility, there is a strong need of research and development of advanced infrastructures and tools to bring immersive experience into teleconferencing so people across geographically distributed sites can interact collaboratively. The Personal Telepresence Station project aims…


Established: August 6, 2002

Digital Technology for Effective Whiteboard Use Introduction Whiteboard is ubiquitous and will exist for foreseeable future, but its content is hard to archive and share. While digital cameras can be used to capture whiteboard content, the image is usually taken…

Face Modeling

Established: May 5, 2001

Overview 3d Modeling. Generating realistic 3D human face models and facial animations has been a persistent challenge in computer vision and graphics. We have developed a system that constructs textured 3D face models from videos with minimal user interaction. Our…





























Link description

Computer Vision and Intelligent Services


July 28, 2015


Baining Guo, Martial Hebert, Tao Mei, and Zhengyou Zhang


Microsoft Research, Carnegie Mellon University, Microsoft



Short Bio

Full version of his résumé is available by clicking here.

Zhengyou Zhang is a Fellow of the Institute of Electrical and Electronic Engineers (IEEE) (2005, for contributions to robust computer vision techniques) and a Fellow of the Association of Computing Machinery (ACM) (2013, for contributions to computer vision and multimedia). He is the Founding Editor-in-Chief of the newly established IEEE Transactions on Autonomous Mental Development (IEEE T-AMD), and is on the Editorial Board of the International Journal of Computer Vision (IJCV), the Machine Vision and Applications, and the Journal of Computer Science and Technology (JCST). He was on the Editorial Board of the IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE T-PAMI) from 1999 to 2005, the IEEE Transactions on Multimedia (IEEE T-MM) from 2004 to 2009, the International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI) from 1997 to 2008, among others. He is listed in Who’s Who in the World, Who’s Who in America and Who’s Who in Science and Engineering.

Before joining Microsoft, Zhengyou worked at INRIA (French National Institute for Research in Computer Science and Control) for 11 years, and was a Senior Research Scientist since 1991, where he worked in the Computer Vision and Robotics group. In 1996-1997, he spent one-year sabbatical as an Invited Researcher at the Advanced Telecommunications Research Institute International (ATR), Kyoto, Japan.

He holds more than 130 US patents and has about 20 patents pending. He also holds a few Japanese patents for his inventions during his sabbatical at ATR.

He has published over 200 papers in refereed international journals and conferences, and is the author of the following books

He has edited multiple books, including

He is a General Co-Chair of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), June 2017, Puerto Rico, USA.

He is a General Co-Chair of the ACM International Conference on Multimodal Interaction (ICMI 2015), Nov. 3-9, 2015, Seattle, USA.

He was a General Co-Chair of the International Workshop on Multimedia Signal Processing (MMSP 2014), September 22-24, 2014, Jakarta, Indonesia.

He was a General Co-Chair of the International Workshop on Multimedia Signal Processing (MMSP 2011), October 17-19, 2011, Hangzhou, China.

He was the Chair of the new Technical Briefs program of the SIGGRAPH Asia, Singapore, November 28 – December 1, 2012.

He was a Program Co-Chair of the International Conference on Multimedia and Expo (ICME), July 2010, a Program Co-Chair of the ACM International Conference on Multimedia (ACM MM), October 2010, and a Program Co-Chair of the ACM International Conference on Multimodal Interfaces (ICMI), November 2010. He was the Program Co-Chair of the 8th International Conference on Development and Learning (ICDL09), June 5-7, 2009, Shanghai, China. He was a Technical Co-Chair of the International Workshop on Multimedia Signal Processing (MMSP06), October 3-6, 2006, Victoria, BC, Canada. He was the Program Co-Chair of the Asian Conference on Computer Vision (ACCV2004), Jan. 27-30, 2004, Jeju Island, Korea; a Demo Chair and an Area Chair of the International Conference on Computer Vision (ICCV2003), Oct. 14-17, 2003, Nice, France; the Demo Chair of the International Conference on Computer Vision (ICCV2005), Oct. 15-21, 2005, Beijing, China. He co-organized the International Workshop on Multimedia Technologies in E-Learning and Collaboration, held in Nice, France, on October 17, 2003. He served on the Program Committees of ICCV, CVPR, ECCV, ACCV and many other international conferences and workshops.

He was a co-organizer of the First International Workshop on Human Activity Understanding from 3D Data (HAU3D) 2011, in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Colorado Springs, June 20-25, 2011; a co-organizer of the Second International Workshop on Human Activity Understanding from 3D Data (HAU3D) 2012, in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, June 16-21, 2012; a co-organizer of the Third International Workshop on Human Activity Understanding from 3D Data (HAU3D) 2013, in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon, June 25-271, 2013.

Zhengyou Zhang is a member of the IEEE Computer Society Fellows Committee from 2005 to 2007, and in 2010 and 2011, a member of IEEE Technical Committee on Multimedia Signal Processing (2006-2010) and the ex-Chair of IEEE Technical Committee on Autonomous Mental Development (2007-2009).

Interview by the Computational Intelligence Magazine is available here (or go to IEEE Xplore).

Interview by the IEEE Signal Processing Magazine on “Telepresence: Virtual Reality in the Real World” (or go to IEEE Xplore). (November 2011)

Natural User Interfaces: What’s Next?. Video on 3D Photorealistic Talking Head.(Februray 2011)



Published in Image and Vision Computing Journal, Vol.15, No.1, pages 59-76, 1997.

Collaborators, Post-Doctoral Researchers and Students

  • Zicheng Liu (Researcher, MSR)
  • Mike Sinclair (Principal Researcher, MSR)
  • Li-wei He (Research Engineer, MSR)
  • Cha Zhang (Researcher, MSR)
  • Rajesh Hegde (Research Engineer, MSR)
  • Dinei Florencio (Researcher, MSR)
  • Qin Cai (Research Engineer, MSR)
  • Wei-ge Chen (Software Architect, MSR)
  • Phil Chou (Principal Researcher, MSR)
  • Ying Shan (Post-Doc, now Scientist at Microsoft Online)
  • Gang Hua (Scientist at Nokia Research)
  • Ming-Ting Sun (Professor, University of Washington)
  • Wanqing Li (Associate Professor, University of Wollongong)
  • Chunhui Zhang (Researcher, MSR Asia, now at Alibaba)
  • John Hershey (Post-Doc, now at IBM Research)

Interns: Sasa Junuzovic (2008), Matt Luciw (2008), Aswin Sankaranarayanan (2008), Xiaogang Wang (2008), Qing Zhang (2008), Raffay Hamid (2007), Sasa Junuzovic (2007), Miao Liao (2007), Mingxuan Sun (2007), Qi Zhao (2007), Amar Subramanya (2006), Sasa Junuzovic (2006), Ming Liu (2005), Gang Hua (2005), Amar Subramanya (2004), Ya Chang (2004), Yanli Zheng (2003), Hanning Zhou (2003), Guodong Guo (2002), Ruigang Yang (2001), Ying Wu (2000), Ko Nishino (1999, 2000), Qifa ke (1998)

Supervision of researchers when I was at INRIA: Nassir Navab (Ph.D., 1993), Michel Buffa (Ph.D., 1993), Gabriella Csurka (Ph.D., 1996), Bernard Hotz (Research Engineer, 1991-1994), Serge Saracco (Master, 1993), Jean-Francois Ponthieux (Master, 1993), Veit Schenk (Master, 1996), Laurence Lucido (Ph.D., 1997), Sylvain Bougnoux (Ph.D., 1998).