Computer vision

Enabling computers and devices to understand what they see.

Computer vision researchers across Microsoft build algorithms and systems to automatically analyze imagery and extract knowledge from the visual world. This knowledge is used for additional research projects, such as the transformation of depth and scene data into three-dimensional renderings and the intelligent synthesis of labels for people, places and things into scene descriptions and calls to action.

Computer vision is multidisciplinary, routinely serving visual processing and analytic components to ambitious projects such as the development of personal robots, self-driving cars and autonomous drones. Artificial intelligence and machine learning are widely embraced in efforts to automate computer vision tasks such as 3D recovery, facial and object recognition, image and video captioning, biometric security, medical imaging and video enhancement.

Focus Areas


Computational photography

Developing algorithms and tools for enriched visual experiences such as photo enhancement, panorama creation, image and video completion, style transfer and face beautification.

Object segmentation, classification and recognition

Pioneering techniques to produce labels for imagery at different levels of detail to enable applications such as visual search, image captioning and the creation of dynamic images.

Image and language understanding

Researching methods and developing models to automatically describe content in images and videos with natural language and generate imagery from descriptive text.

3D modeling

Innovating tools and techniques to create three-dimensional models of visual spaces from images and videos.

Video analytics

Developing techniques to analyze videos for applications such as content summarization, object detection and action recognition.

Interactive vision

Developing methods that balance automated computer vision with the subjective nature of human judgement.

Face and gestures

Innovating image and video analysis techniques to recognize people, infer emotions and interpret gestures.