Natural Language Processing

Established: June 27, 2016

The Redmond-based Natural Language Processing group is focused on developing efficient algorithms to process texts and to make their information accessible to computer applications. Since text can contain information at many different granularities, from simple word or token-based representations, to rich hierarchical syntactic representations, to high-level logical representations across document collections, the group seeks to work at the right level of analysis for the application concerned.

The goal of the Natural Language Processing (NLP) group is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person.

This goal is not easy to reach. “Understanding” language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. It’s ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master. Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.

The challenges we face stem from the highly ambiguous nature of natural language. As an English speaker you effortlessly understand a sentence like “Flying planes can be dangerous”. Yet this sentence presents difficulties to a software program that lacks both your knowledge of the world and your experience with linguistic structures. Is the more plausible interpretation that the pilot is at risk, or that the danger is to people on the ground? Should “can” be analyzed as a verb or as a noun? Which of the many possible meanings of “plane” is relevant? Depending on context, “plane” could refer to, among other things, an airplane, a geometric object, or a woodworking tool. How much and what sort of context needs to be brought to bear on these questions in order to adequately disambiguate the sentence?

We address these problems using a mix of knowledge-engineered and statistical/machine-learning techniques to disambiguate and respond to natural language input. Our work has implications for applications like text critiquing, information retrieval, question answering, summarization, gaming, and translation. The grammar checkers in Office for English, French, German, and Spanish are outgrowths of our research; Encarta uses our technology to retrieve answers to user questions; Intellishrink uses natural language technology to compress cellphone messages; Microsoft Product Support uses our machine translation software to translate the Microsoft Knowledge Base into other languages. As our work evolves, we expect it to enable any area where human users can benefit by communicating with their computers in a natural way.




Detecting Fake Reviews Link description

Detecting Fake Reviews


October 16, 2012


Bing Liu


University of Illinois at Chicago (UIC)

NW-NLP 2012 Morning Talks Link description

NW-NLP 2012 Morning Talks


May 11, 2012


Matt Hohensee, Anthony Stark, Shafiq Joty, and Ryan Georgi


University of Washington, OHSU, University of British Columbia

NW-NLP 2012 Afternoon Talks Link description

NW-NLP 2012 Afternoon Talks


May 11, 2012


Emily Prud'hommeaux, Congle Zhang, and Max Whitney


OHSU, University of Washington, Simon Fraser University

UW/MS symposium Link description

UW/MS symposium


June 6, 2008


Danyel Fisher, Douglas Downey, Chris Quirk, Scott Drellishak, Kelly O'Hara, Emily M. Bender, Sumit Basu, Matthew Hurst, Arnd Christian König, Michael Gamon, Chris Brockett, Dmitriy Belenko, Bill Dolan, Jianfeng Gao, and Lucy Vanderwende


Intelligent Editing

The Intelligent Editing Project seeks to apply neural networks and other modern machine learning techniques to furnish editorial assistance.  We look beyond traditional grammatical error checking to focus on facilitating writers by providing them with fluent, meaningful text editing support that…

From Captions to Visual Concepts and Back

Established: April 9, 2015

We introduce a novel approach for automatically generating image descriptions. Visual detectors, language models, and deep multimodal similarity models are learned directly from a dataset of image captions. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a…

Data-Driven Conversation

This project aims to enable people to converse with their devices. We are trying to teach devices to engage with humans using human language in ways that appear seamless and natural to humans. Our research focuses on statistical methods by…


Established: October 3, 2014

An introduction by Lucy Vanderwende* * on behalf of everyone who contributed to the development of NLPwin NLPwin is a software project at Microsoft Research that aims to provide Natural Language Processing tools for Windows (hence, NLPwin). The project was started…

Dialog and Conversational Systems Research

Established: March 14, 2014

Conversational systems interact with people through language to assist, enable, or entertain. Research at Microsoft spans dialogs that use language exclusively, or in conjunctions with additional modalities like gesture; where language is spoken or in text; and in a variety…


Established: April 4, 2012

Statistical Parsing and Linguistic Analysis Toolkit is a linguistic analysis toolkit. Its main goal is to allow easy access to the linguistic analysis tools produced by the Natural Language Processing group at Microsoft Research. The tools include both traditional linguistic…

Microsoft Research ESL Assistant

Established: May 9, 2008

The Microsoft Research ESL Assistant is a web service that provides correction suggestions for typical ESL (English as a Second Language) errors. Such errors include, for example, the choice of determiners (the/a) and the choice…

Blews – what the blogosphere tells you about news

Established: February 18, 2008

While typical news-aggregation sites do a good job of clustering news stories according to topic, they leave the reader without information about which stories figure prominently in political discourse. BLEWS uses political blogs to categorize news stories according to their…

Machine Translation

Established: January 18, 2002

The principal focus of the Natural Language Processing group is to build a machine translation system that automatically learns translation mappings from bilingual corpora. <h1>Overview</h1> Machine Translation (MT) project at Microsoft Research is focused on creating MT systems and technologies…


Established: December 19, 2001

Overview MindNet is a knowledge representation project that uses our broad-coverage parser to build semantic networks from dictionaries, encyclopedias, and free text. MindNets are produced by a fully automatic process that takes the input text, sentence-breaks it, parses each sentence…


Speaking in Someone Else’s Language

Springtime on the sun-drenched Amalfi Coast. A perfect little café perched high above the sea, the scent of jasmine and lemon blossoms wafting past. You open the menu, hungry for lunch. Oh, wait—you don’t know any Italian. Now what? Not…

September 2013

Microsoft Research Blog

Software Aids Language Learners

By Gary Alt, Writer, Microsoft Imagine mining the web to learn a language. No, not the jargon of webspeak, where IMHO means “in my humble opinion” or F2F is “face to face,” but real, spoken languages, such as Spanish, Hindi,…

September 2010

Microsoft Research Blog

Translator Fast-Tracks Haitian Creole

By Janie Chang, Writer, Microsoft Research In disaster relief, every hour makes a difference, and communication is essential. When aid efforts began after the recent Haiti earthquake, a request came to the Machine Translation team within Microsoft Research’s Natural Language…

February 2010

Microsoft Research Blog

Translating the Web for the Entire World

By Rob Knies, Managing Editor, Microsoft Research People all over the world use the Internet every day, to purchase goods or services, to search for information, to find diversions. But is the World Wide Web truly worldwide? It’s difficult to…

March 2008

Microsoft Research Blog