Natural Language Processing

Established: June 27, 2016

The Redmond-based Natural Language Processing group is focused on developing efficient algorithms to process texts and to make their information accessible to computer applications. Since text can contain information at many different granularities, from simple word or token-based representations, to rich hierarchical syntactic representations, to high-level logical representations across document collections, the group seeks to work at the right level of analysis for the application concerned.

Overview
The goal of the Natural Language Processing (NLP) group is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person.

This goal is not easy to reach. “Understanding” language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. It’s ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master. Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.

The challenges we face stem from the highly ambiguous nature of natural language. As an English speaker you effortlessly understand a sentence like “Flying planes can be dangerous”. Yet this sentence presents difficulties to a software program that lacks both your knowledge of the world and your experience with linguistic structures. Is the more plausible interpretation that the pilot is at risk, or that the danger is to people on the ground? Should “can” be analyzed as a verb or as a noun? Which of the many possible meanings of “plane” is relevant? Depending on context, “plane” could refer to, among other things, an airplane, a geometric object, or a woodworking tool. How much and what sort of context needs to be brought to bear on these questions in order to adequately disambiguate the sentence?

We address these problems using a mix of knowledge-engineered and statistical/machine-learning techniques to disambiguate and respond to natural language input. Our work has implications for applications like text critiquing, information retrieval, question answering, summarization, gaming, and translation. The grammar checkers in Office for English, French, German, and Spanish are outgrowths of our research; Encarta uses our technology to retrieve answers to user questions; Intellishrink uses natural language technology to compress cellphone messages; Microsoft Product Support uses our machine translation software to translate the Microsoft Knowledge Base into other languages. As our work evolves, we expect it to enable any area where human users can benefit by communicating with their computers in a natural way.

People

Publications

2016

Visual Storytelling
Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell, in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), 2016, ACL – Association for Computational Linguistics, June 13, 2016, View abstract, Download PDF, View external link

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

Downloads

NCI-PID-PubMed Genomics Knowledge Base Completion Dataset

October 2016

This dataset includes a database of regulation relationships among genes and corresponding textual mentions of pairs of genes in PubMed article abstracts.

    Click the icon to access this download

  • Website

Videos

Detecting Fake Reviews Link description

Detecting Fake Reviews

Date

October 16, 2012

Speakers

Bing Liu

Affiliation

University of Illinois at Chicago (UIC)

NW-NLP 2012 Morning Talks Link description

NW-NLP 2012 Morning Talks

Date

May 11, 2012

Speakers

Matt Hohensee, Anthony Stark, Shafiq Joty, and Ryan Georgi

Affiliation

University of Washington, OHSU, University of British Columbia

NW-NLP 2012 Afternoon Talks Link description

NW-NLP 2012 Afternoon Talks

Date

May 11, 2012

Speakers

Emily Prud'hommeaux, Congle Zhang, and Max Whitney

Affiliation

OHSU, University of Washington, Simon Fraser University

UW/MS symposium Link description

UW/MS symposium

Date

June 6, 2008

Speakers

Danyel Fisher, Douglas Downey, Chris Quirk, Scott Drellishak, Kelly O'Hara, Emily M. Bender, Sumit Basu, Matthew Hurst, Arnd Christian König, Michael Gamon, Chris Brockett, Dmitriy Belenko, Bill Dolan, Jianfeng Gao, and Lucy Vanderwende

Projects

Intelligent Editing

The Intelligent Editing Project seeks to apply neural networks and other modern machine learning techniques to furnish editorial assistance.  We look beyond traditional grammatical error checking to focus on facilitating writers by providing them with fluent, meaningful text editing support that…

From Captions to Visual Concepts and Back

Established: April 9, 2015

We introduce a novel approach for automatically generating image descriptions. Visual detectors, language models, and deep multimodal similarity models are learned directly from a dataset of image captions. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a…

Data-Driven Conversation

This project aims to enable people to converse with their devices. We are trying to teach devices to engage with humans using human language in ways that appear seamless and natural to humans. Our research focuses on statistical methods by…

NLPwin

Established: October 3, 2014

An introduction by Lucy Vanderwende* * on behalf of everyone who contributed to the development of NLPwin NLPwin is a software project at Microsoft Research that aims to provide Natural Language Processing tools for Windows (hence, NLPwin). The project was started…

Dialog and Conversational Systems Research

Established: March 14, 2014

Conversational systems interact with people through language to assist, enable, or entertain. Research at Microsoft spans dialogs that use language exclusively, or in conjunctions with additional modalities like gesture; where language is spoken or in text; and in a variety…

MSR SPLAT

Established: April 4, 2012

Statistical Parsing and Linguistic Analysis Toolkit is a linguistic analysis toolkit. Its main goal is to allow easy access to the linguistic analysis tools produced by the Natural Language Processing group at Microsoft Research. The tools include both traditional linguistic…

Microsoft Research ESL Assistant

Established: May 9, 2008

The Microsoft Research ESL Assistant is a web service that provides correction suggestions for typical ESL (English as a Second Language) errors. Such errors include, for example, the choice of determiners (the/a) and the choice…

Blews – what the blogosphere tells you about news

Established: February 18, 2008

While typical news-aggregation sites do a good job of clustering news stories according to topic, they leave the reader without information about which stories figure prominently in political discourse. BLEWS uses political blogs to categorize news stories according to their…

MindNet

Established: December 19, 2001

Overview MindNet is a knowledge representation project that uses our broad-coverage parser to build semantic networks from dictionaries, encyclopedias, and free text. MindNets are produced by a fully automatic process that takes the input text, sentence-breaks it, parses each sentence…

Posts

Speaking in Someone Else’s Language

Springtime on the sun-drenched Amalfi Coast. A perfect little café perched high above the sea, the scent of jasmine and lemon blossoms wafting past. You open the menu, hungry for lunch. Oh, wait—you don’t know any Italian. Now what? Not…

September 2013

Microsoft Research Blog

Software Aids Language Learners

By Gary Alt, Writer, Microsoft Imagine mining the web to learn a language. No, not the jargon of webspeak, where IMHO means “in my humble opinion” or F2F is “face to face,” but real, spoken languages, such as Spanish, Hindi,…

September 2010

Microsoft Research Blog

Translator Fast-Tracks Haitian Creole

By Janie Chang, Writer, Microsoft Research In disaster relief, every hour makes a difference, and communication is essential. When aid efforts began after the recent Haiti earthquake, a request came to the Machine Translation team within Microsoft Research’s Natural Language…

February 2010

Microsoft Research Blog

Translating the Web for the Entire World

By Rob Knies, Managing Editor, Microsoft Research People all over the world use the Internet every day, to purchase goods or services, to search for information, to find diversions. But is the World Wide Web truly worldwide? It’s difficult to…

March 2008

Microsoft Research Blog