Natural Language Processing

Established: June 27, 2016

The Redmond-based Natural Language Processing group is focused on developing efficient algorithms to process texts and to make their information accessible to computer applications. Since text can contain information at many different granularities, from simple word or token-based representations, to rich hierarchical syntactic representations, to high-level logical representations across document collections, the group seeks to work at the right level of analysis for the application concerned.

Overview
The goal of the Natural Language Processing (NLP) group is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person.

This goal is not easy to reach. “Understanding” language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. It’s ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master. Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.

The challenges we face stem from the highly ambiguous nature of natural language. As an English speaker you effortlessly understand a sentence like “Flying planes can be dangerous”. Yet this sentence presents difficulties to a software program that lacks both your knowledge of the world and your experience with linguistic structures. Is the more plausible interpretation that the pilot is at risk, or that the danger is to people on the ground? Should “can” be analyzed as a verb or as a noun? Which of the many possible meanings of “plane” is relevant? Depending on context, “plane” could refer to, among other things, an airplane, a geometric object, or a woodworking tool. How much and what sort of context needs to be brought to bear on these questions in order to adequately disambiguate the sentence?

We address these problems using a mix of knowledge-engineered and statistical/machine-learning techniques to disambiguate and respond to natural language input. Our work has implications for applications like text critiquing, information retrieval, question answering, summarization, gaming, and translation. The grammar checkers in Office for English, French, German, and Spanish are outgrowths of our research; Encarta uses our technology to retrieve answers to user questions; Intellishrink uses natural language technology to compress cellphone messages; Microsoft Product Support uses our machine translation software to translate the Microsoft Knowledge Base into other languages. As our work evolves, we expect it to enable any area where human users can benefit by communicating with their computers in a natural way.

People

Publications

Videos

Detecting Fake Reviews Link description

Detecting Fake Reviews

Date

October 16, 2012

Speakers

Bing Liu

Affiliation

University of Illinois at Chicago (UIC)

NW-NLP 2012 Afternoon Talks Link description

NW-NLP 2012 Afternoon Talks

Date

May 11, 2012

Speakers

Emily Prud'hommeaux, Congle Zhang, and Max Whitney

Affiliation

OHSU, University of Washington, Simon Fraser University

NW-NLP 2012 Morning Talks Link description

NW-NLP 2012 Morning Talks

Date

May 11, 2012

Speakers

Matt Hohensee, Anthony Stark, Shafiq Joty, and Ryan Georgi

Affiliation

University of Washington, OHSU, University of British Columbia

UW/MS symposium Link description

UW/MS symposium

Date

June 6, 2008

Speakers

Danyel Fisher, Douglas Downey, Chris Quirk, Scott Drellishak, Kelly O'Hara, Emily M. Bender, Sumit Basu, Matthew Hurst, Arnd Christian König, Michael Gamon, Chris Brockett, Dmitriy Belenko, Bill Dolan, Jianfeng Gao, and Lucy Vanderwende

Projects

From Captions to Visual Concepts and Back

Established: April 9, 2015

We introduce a novel approach for automatically generating image descriptions. Visual detectors, language models, and deep multimodal similarity models are learned directly from a dataset of image captions. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a…

Data-Driven Conversation

This project aims to enable people to converse with their devices. We are trying to teach devices to engage with humans using human language in ways that appear seamless and natural to humans. Our research focuses on statistical methods by…

NLPwin

Established: October 3, 2014

An introduction by Lucy Vanderwende* * on behalf of everyone who contributed to the development of NLPwin NLPwin is a software project at Microsoft Research that aims to provide Natural Language Processing tools for Windows (hence, NLPwin). The project was started…

Dialog and Conversational Systems Research

Established: March 14, 2014

Conversational systems interact with people through language to assist, enable, or entertain. Research at Microsoft spans dialogs that use language exclusively, or in conjunctions with additional modalities like gesture; where language is spoken or in text; and in a variety…

MSR SPLAT

Established: April 4, 2012

Statistical Parsing and Linguistic Analysis Toolkit is a linguistic analysis toolkit. Its main goal is to allow easy access to the linguistic analysis tools produced by the Natural Language Processing group at Microsoft Research. The tools include both traditional linguistic…

Microsoft Research ESL Assistant

Established: May 9, 2008

The Microsoft Research ESL Assistant is a web service that provides correction suggestions for typical ESL (English as a Second Language) errors. Such errors include, for example, the choice of determiners (the/a) and the choice…

Blews – what the blogosphere tells you about news

Established: February 18, 2008

While typical news-aggregation sites do a good job of clustering news stories according to topic, they leave the reader without information about which stories figure prominently in political discourse. BLEWS uses political blogs to categorize news stories according to their…

Machine Translation

Established: January 18, 2002

The principal focus of the Natural Language Processing group is to build a machine translation system that automatically learns translation mappings from bilingual corpora. <h1>Overview</h1> Machine Translation (MT) project at Microsoft Research is focused on creating MT systems and technologies…

MindNet

Established: December 19, 2001

Overview MindNet is a knowledge representation project that uses our broad-coverage parser to build semantic networks from dictionaries, encyclopedias, and free text. MindNets are produced by a fully automatic process that takes the input text, sentence-breaks it, parses each sentence…

Posts

Speaking in Someone Else’s Language

Springtime on the sun-drenched Amalfi Coast. A perfect little café perched high above the sea, the scent of jasmine and lemon blossoms wafting past. You open the menu, hungry for lunch. Oh, wait—you don’t know any Italian. Now what? Not…

September 2013

Microsoft Research Blog