Portrait of Andreas Stolcke

Andreas Stolcke

Principal Researcher


I am a researcher in the Speech and Dialog Research Group at Microsoft Research, working at Microsoft's Silicon Valley Campus. My interests include speech recognition and understanding, language modeling, speaker recognition, language and dialect recognition, machine translation, historical linguistics, and software tools for computational linguistics. I am also an External Fellow at the International Computer Science Institute (ICSI) in Berkeley, where I lead a joint project with Microsoft.

Prior to 2011, I worked in the Speech Technology and Research Laboratory at SRI International, mainly on government-funded research projects in speech recognition, speaker recognition, and machine translation, such as the DARPA projects EARS, GALE, and RATS.  While at SRI, I also collaborated with ICSI on recognition of multiparty meetings and other projects, and developed the open-source SRI Language Modeling Toolkit.

My Ph.D. research was  in Computer Science at the University of California, Berkeley, on parsing and Bayesian learning of stochastic grammars.


Human Parity in Speech Recognition

Established: December 1, 2015

This ongoing project aims to drive the state of the art in speech recognition toward  matching, and ultimately surpassing, humans, with a focus on unconstrained conversational speech.   The goal is a moving target as the scope of the task is broadened from high signal-to-noise speech between strangers (like in the Switchboard corpus) to include scenarios that make recognition more challenging, such as:  conversation among familiar speakers, multi-speaker meetings, and speech captured in noisy or distant-microphone environments.

Eye Gaze and Face Pose for Better Speech Recognition

Established: October 2, 2014

We want to use eye gaze and face pose to understand what users are looking at, to what they are attending, and use this information to improve speech recognition. Any sort of language constraint makes speech recognition and understanding easier since the we know what words might come next. Our work has shown significant performance improvements in all stages of the speech-processing pipeline: including addressee detection, speech recognition, and spoken-language understanding.

Dialog and Conversational Systems Research

Conversational systems interact with people through language to assist, enable, or entertain. Research at Microsoft spans dialogs that use language exclusively, or in conjunctions with additional modalities like gesture; where language is spoken or in text; and in a variety of settings, such as conversational systems in apps or devices, and situated interactions in the real world. Projects Spoken Language Understanding

Meeting Recognition and Understanding

Established: July 30, 2013

In most organizations, staff spend many hours in meetings. This project addresses all levels of analysis and understanding, from speaker tracking and robust speech transcription to meaning extraction and summarization, with the goal of increasing productivity both during the meeting and after, for both participants and nonparticipants. The Meeting Recognition and Understanding project is a collection of online and offline spoken language understanding tasks. The following functions could be performed both on- and offline, but…

Recurrent Neural Networks for Language Processing

Established: November 23, 2012

This project focuses on advancing the state-of-the-art in language processing with recurrent neural networks. We are currently applying these to language modeling, machine translation, speech recognition, language understanding and meaning representation. A special interest in is adding side-channels of information as input, to model phenomena which are not easily handled in other frameworks. A toolkit for doing RNN language modeling with side-information is in the associated download. Sample word vectors for use with this toolkit…

Speech Technology for Computational Phonetics and Reading Assessment

Established: March 1, 2011

This project aims to develop new tools for phonetics research on large speech corpora without requiring traditional phonetic annotations by humans.  The idea is to adapt tools from speech recognition to replace the costly and time-consuming annotations usually required for phonetics research. This project was originally started by an NSF grant "New tools and methods for very-large-scale phonetics research" to UPenn and SRI, with a Microsoft researcher as a consultant. More recently, work on computational phonetics has…






An Introduction to Computational Networks and the Computational Network Toolkit
Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Oleksii Kuchaiev, Yu Zhang, Frank Seide, Zhiheng Huang, Brian Guenter, Huaming Wang, Jasha Droppo, Geoffrey Zweig, Chris Rossbach, Jie Gao, Andreas Stolcke, Jon Currey, Malcolm Slaney, Guoguo Chen, Amit Agarwal, Chris Basoglu, Marko Padmilac, Alexey Kamenev, Vladimir Ivanov, Scott Cypher, Hari Parthasarathi, Bhaskar Mitra, Baolin Peng, Xuedong Huang, Microsoft Research, October 1, 2014, View abstract, Download PDF





The CALO Meeting Assistant System
Gokhan Tur, Andreas Stolcke, Lynn Voss, Stanley Peters, Dilek Hakkani-Tür, John Dowding, Benoit Favre, Raquel Fernández, Matthew Frampton, Mike Frandsen, Clint Frederickson, Martin Graciarena, Donald Kintzing, Kyle Leveque, Shane Mason, John Niekrasz, Matthew Purver, Korbinian Riedhammer, Elizabeth Shriberg, Jing Tien, Dimitra Vergyri, Fan Yang, August 1, 2010, View abstract, Download PDF



Older Publications


Professional Activities