Natural Language Processing

Established: June 27, 2016

The Redmond-based Natural Language Processing group is focused on developing efficient algorithms to process texts and to make their information accessible to computer applications. Since text can contain information at many different granularities, from simple word or token-based representations, to rich hierarchical syntactic representations, to high-level logical representations across document collections, the group seeks to work at the right level of analysis for the application concerned.

The goal of the Natural Language Processing (NLP) group is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person.

This goal is not easy to reach. “Understanding” language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. It’s ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master. Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.

The challenges we face stem from the highly ambiguous nature of natural language. As an English speaker you effortlessly understand a sentence like “Flying planes can be dangerous”. Yet this sentence presents difficulties to a software program that lacks both your knowledge of the world and your experience with linguistic structures. Is the more plausible interpretation that the pilot is at risk, or that the danger is to people on the ground? Should “can” be analyzed as a verb or as a noun? Which of the many possible meanings of “plane” is relevant? Depending on context, “plane” could refer to, among other things, an airplane, a geometric object, or a woodworking tool. How much and what sort of context needs to be brought to bear on these questions in order to adequately disambiguate the sentence?

We address these problems using a mix of knowledge-engineered and statistical/machine-learning techniques to disambiguate and respond to natural language input. Our work has implications for applications like text critiquing, information retrieval, question answering, summarization, gaming, and translation. The grammar checkers in Office for English, French, German, and Spanish are outgrowths of our research; Encarta uses our technology to retrieve answers to user questions; Intellishrink uses natural language technology to compress cellphone messages; Microsoft Product Support uses our machine translation software to translate the Microsoft Knowledge Base into other languages. As our work evolves, we expect it to enable any area where human users can benefit by communicating with their computers in a natural way.




Visual Storytelling
Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell, in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), 2016, June 13, 2016, View abstract, Download PDF, View external link


















NCI-PID-PubMed Genomics Knowledge Base Completion Dataset

October 2016

This dataset includes a database of regulation relationships among genes and corresponding textual mentions of pairs of genes in PubMed article abstracts.

    Click the icon to access this download

  • Website


Link description

Detecting Fake Reviews


October 16, 2012


Bing Liu


University of Illinois at Chicago (UIC)

Link description

NW-NLP 2012 Afternoon Talks


May 11, 2012


Emily Prud'hommeaux, Congle Zhang, and Max Whitney


OHSU, University of Washington, Simon Fraser University

Link description

NW-NLP 2012 Morning Talks


May 11, 2012


Matt Hohensee, Anthony Stark, Shafiq Joty, and Ryan Georgi


University of Washington, OHSU, University of British Columbia

Link description

UW/MS symposium


June 6, 2008


Danyel Fisher, Douglas Downey, Chris Quirk, Scott Drellishak, Kelly O'Hara, Emily M. Bender, Sumit Basu, Matthew Hurst, Arnd Christian König, Michael Gamon, Chris Brockett, Dmitriy Belenko, Bill Dolan, Jianfeng Gao, and Lucy Vanderwende


Intelligent Editing

The Intelligent Editing Project seeks to apply neural networks and other modern machine learning techniques to furnish editorial assistance.  We look beyond traditional grammatical error checking to focus on facilitating writers by providing them with fluent, meaningful text editing support that is appropriate to their objectives and their targeted readership.  Our interests include sentence compression and summarization,  paraphrasing and stylistic variation, and writing assistance for non-native writers.    The MSR Abstractive Text Compression Dataset described in our…

From Captions to Visual Concepts and Back

Established: April 9, 2015

We introduce a novel approach for automatically generating image descriptions. Visual detectors, language models, and deep multimodal similarity models are learned directly from a dataset of image captions. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. Human judges consider the captions to be as good as or better than humans 34% of the time.  


Established: October 3, 2014

An introduction by Lucy Vanderwende* * on behalf of everyone who contributed to the development of NLPwin NLPwin is a software project at Microsoft Research that aims to provide Natural Language Processing tools for Windows (hence, NLPwin). The project was started in 1991, just as Microsoft inaugurated the Microsoft Research group; while active development of NLPwin continued through 2002, it is still being updated regularly, primarily in service of Machine Translation. NLPwin was and is still…

Data-Driven Conversation

Established: June 1, 2014

This project aims to enable people to converse with their devices. We are trying to teach devices to engage with humans using human language in ways that appear seamless and natural to humans. Our research focuses on statistical methods by which devices can learn from human-human conversational interactions and can situate responses in the verbal context and in physical or virtual environments. Natural and Engaging Agents that process human language will play a growing role…

Dialog and Conversational Systems Research

Established: March 14, 2014

Conversational systems interact with people through language to assist, enable, or entertain. Research at Microsoft spans dialogs that use language exclusively, or in conjunctions with additional modalities like gesture; where language is spoken or in text; and in a variety of settings, such as conversational systems in apps or devices, and situated interactions in the real world. Projects Spoken Language Understanding


Established: April 4, 2012

Statistical Parsing and Linguistic Analysis Toolkit is a linguistic analysis toolkit. Its main goal is to allow easy access to the linguistic analysis tools produced by the Natural Language Processing group at Microsoft Research. The tools include both traditional linguistic analysis tools such as part-of-speech taggers and parsers, and more recent developments, such as sentiment analysis (identifying whether a particular of text has positive or negative sentiment towards its focus) Demo URL: Service URL:…

Microsoft Research ESL Assistant

Established: May 9, 2008

The Microsoft Research ESL Assistant is a web service that provides correction suggestions for typical ESL (English as a Second Language) errors. Such errors include, for example, the choice of determiners (the/a) and the choice of prepositions. The web service also provides word choice suggestions from a thesaurus. In order to help the user make decisions on whether to accept a suggestion, the service displays "before and after" web search…

Blews – what the blogosphere tells you about news

Established: February 18, 2008

While typical news-aggregation sites do a good job of clustering news stories according to topic, they leave the reader without information about which stories figure prominently in political discourse. BLEWS uses political blogs to categorize news stories according to their reception in the conservative and liberal blogospheres. It visualizes information about which stories are linked to from conservative and liberal blogs, and it indicates the level of emotional charge in the discussion of the news…


Established: December 19, 2001

Overview MindNet is a knowledge representation project that uses our broad-coverage parser to build semantic networks from dictionaries, encyclopedias, and free text. MindNets are produced by a fully automatic process that takes the input text, sentence-breaks it, parses each sentence to build a semantic dependency graph (Logical Form), aggregates these individual graphs into a single large graph, and then assigns probabilistic weights to subgraphs based on their frequency in the corpus as a whole. The…

Microsoft Research blog

Microsoft NLP researchers converge at ACL 2016, edging ever closer to human-like conversational experiences

By Bill Dolan, Principal Researcher, Microsoft Research This year, the annual meeting of the Association for Computational Linguistics (ACL) will be held in Berlin, Germany, August 7-12, 2016, at Humboldt University. ACL is the premier conference on natural language processing (NLP) systems and computational linguistics. As a Gold sponsor, Microsoft is proud to have more than 20 researchers attending and presenting at ACL. Along with my colleagues in the Natural Language Processing and Speech group,…

August 2016

Microsoft Research Blog

Speaking in Someone Else’s Language

Springtime on the sun-drenched Amalfi Coast. A perfect little café perched high above the sea, the scent of jasmine and lemon blossoms wafting past. You open the menu, hungry for lunch. Oh, wait—you don’t know any Italian. Now what? Not to worry: Just whip out your Windows Phone, hover the camera over the menu, and the Bing Translator app will tell you what your choices are. Better yet, you can speak into the phone’s mic,…

September 2013

Microsoft Research Blog

Software Aids Language Learners

By Gary Alt, Writer, Microsoft Imagine mining the web to learn a language. No, not the jargon of webspeak, where IMHO means “in my humble opinion” or F2F is “face to face,” but real, spoken languages, such as Spanish, Hindi, or Japanese. That’s the notion that intrigued Ming Zhou, Matt Scott, and their colleagues at Microsoft Research Asia as they studied how the web’s zillions of words, in scores of languages, could be utilized for…

September 2010

Microsoft Research Blog

Translator Fast-Tracks Haitian Creole

By Janie Chang, Writer, Microsoft Research In disaster relief, every hour makes a difference, and communication is essential. When aid efforts began after the recent Haiti earthquake, a request came to the Machine Translation team within Microsoft Research’s Natural Language Processing (NLP) group from Microsoft volunteers involved in the community supporting assistance in Haiti: Was there a quick way to deliver an online English/Haitian Creole translator? The request to the team came on Tuesday, Jan.…

February 2010

Microsoft Research Blog

Translating the Web for the Entire World

By Rob Knies, Managing Editor, Microsoft Research People all over the world use the Internet every day, to purchase goods or services, to search for information, to find diversions. But is the World Wide Web truly worldwide? It’s difficult to make the case. Estimates claim that approximately 70 percent of Web pages today are created in the English language, while the percentage of non-English speakers is growing faster than that of English speakers. So what…

March 2008

Microsoft Research Blog