Visual Computing

Established: July 3, 2010


Computer Vision is an exciting new research area that studies how to make computers efficiently perceive, process, and understand visual data such as images and videos. The ultimate goal is for computers to emulate the striking perceptual capability of human eyes and brains, or even to surpass and assist the human in certain ways. The Visual Computing Group at Microsoft Research Asia consists of an elite team of researchers and engineers whose expertise spans the entire spectrum of research topics in computer vision, from mathematical theory to practical applications, from physical systems to software development, and from low-level image processing to high-level image understanding. Research results from our group have made fundamental impacts on many important applications such as New High-Resolution Cameras, Face Recognition, Image Search, Virtual Earth, and Graphics & Games.

More specifically, our research activities are centered around several main research thrusts:

  1. Imaging and Photogrammetry, including high-resolution cameras, radiometric calibration, photometric stereos, 3D imaging and video, and image and video enhancement.
  2. Pattern Recognition and Statistical Learning, including data clustering and classification, manifold learning, and high-dimensional geometry and statistics.
  3. Object Detection and Recognition, including face detection, alignment, and tagging, video-based face recognition, and sparsity-based robust face recognition.
  4. Dynamical Vision, including object tracking, video motion analysis and edition, video summarization, video motion and object segmentation, dynamical photometric stereo.
  5. Interactive and Internet Vision, including interactive image segmentation, completion, and normal reconstruction, and image search and re-ranking, and large scale image and object retrieval, large volume of images visualization.

Group News & Activities

  • Tutorial on Robust PCA and its Applications by John, Zhouchen, and Yi at International Conference on Image Processing, Hongkong, September 2010.
  • Tutorial on Photometric Stereo by Yasuyuki, Bennett, and Moshe at International Conference on Image Processing, Hongkong, September 2010.
  • The group manager Yi Ma was interviewed by CNN about an article “Why Face Recognition isn’t Scary — yet.” July 9, 2010
  • Our group has 15 papers published at the IEEE international conference on Computer Vision and Pattern Recognition (CVPR), June 2010.
  • The group manager Yi Ma gave the plenary speech at the international conference on Visual Communications and Image Processing, July 2010.
  • Microsoft Research featured story “Yi Ma and the Blessing of Dimensionality.” May 28, 2010.
  • A new giga-pixel digital camera is developed by our researcher Moshe Ben-Ezra.

Hightlighted Projects

A Glimpse at Several Representative Projects:

  • A Giga-Pixel Digital Camera (by Dr. Moshe Ben-Ezra): This revolutionary camera represents the state-of-the-art commercially affordable (less than $25K) solution to high-quality and high-resolution imaging. The camera produces high-quality images at the resolution of 1.6 giga pixels. It has been used to digitize artworks or antiques with unprecedented details. For example, combining with photometric stereo, images captured by this camera can recover striking 3D surface details of oil paintings and hence help reveal the artist’s skills and style. This camera has broad applications in cultural heritage, archeology, and art preservation and insurance etc.
  • Robust Processing and Analysis of High-Dimensional Data (John Wright, Zhouchen Lin, Yi Ma): The need to detect and correct gross errors and outliers arises in problems throughout computational data analysis. For example, in many computer vision problems erroneous measurements arise due to occlusion, tracker failure, or due to violations of an assumed model (i.e., specularities in face recognition or photometric stereo). Correctly handling such non-ideal observations is essential to building systems that work under real-world conditions. We are working to meet this need with new algorithmic tools based on convex optimization. These algorithms are scalable and efficient, and come with sharp performance guarantees based on concentration of measure in high-dimensional spaces. These new tools have made revolutionary impacts on important problems such as highly robust face recognition and robust principal component analysis.
  • Photo Album Management – Face Tagging (Fang Wen, Jian Sun): Nowadays, more and more people take huge amount of photos in their daily life. The final goal of the photo album management work is help users to manage, search, share and make fun from these photos easily. ‘Who is in the photo’ is a good clue to organize and share photos. However, tagging people name is a tedious job for the user. Our Face Tagging work is trying to combine state-of-art face recognition and clustering technologies with a friendly user interface to make tagging effortless and fun.
  • Video Analysis and Synthesis (Yichen Wei, Yasuyuki Matsushita): We work on the problem of analysis, browsing, and automated synthesis of videos. Video is an important medium that becomes more and more popular with the increasing availability of video cameras. Many previous approaches take natural extension from image analysis and synthesis, and the dynamics in the video is often disregarded. We are interested in the dynamics in videos, and use it for further analyses and applications. Past works include video stabilization, video completion, video object tracking and global motion analysis. We are innovating new technologies capitalizing on the visual dynamics in videos.
  • Interactive Image Segmentation and Cut-Out (Jian Sun): The problem of efficient, interactive foreground/background segmentation in still images is of great practical importance in image editing. As the research outputs, we have developed a scribble-based tool (Lazy Snapping) and a painting-based tool (Paint Selection). Using our tools, the user can effortlessly select an interested object/region with minimal user assistance, for the applications from object cut-and-paste to local color/tone adjustment.