Portrait of Jasha Droppo

Jasha Droppo

Principal Researcher


I have been with Microsoft Research since July, 2000. My primary goal is to build automatic speech recognition systems that are as good as, or better than, humans.

Other projects I’ve worked on include noise robust speech recognition, general speech signal enhancement, pitch tracking, multiple stream ASR, novel speech recognition features, MiPad multimodal interface, cepstral compression and transport, and the WITTY microphone.

I earned my Ph.D. in Electrical Engineering at the University of Washington’s Interactive Systems Design Laboratory in June of 2000. Early in my studies, I helped to develop a discrete theory for time-frequency representations of non-stationary audio signals. The application of this theory to speech recognition was the core of my thesis, “Time-Frequency Representations for Speech Recognition.” Other projects I worked on during this time included a GMM-based speaker verification system, subliminal audio message encoding, and non-linear signal morphing.

My MSEE was also earned at the University of Washington, in 1996. I earned my BSEE from Gonzaga University in Spokane, in 1994. My final project consisted of building a control system for a high speed dot-matrix printer.  I wrote a paper comparing and contrasting the behavior of fuzzy controllers to linear controllers, and received first prize in the region’s IEEE paper contest.


Voice Search: Say What You Want and Get It

Established: December 15, 2008

In the Voice Search project, we envision a future where you can ask your cellphone for any kind of information and get it. With a small cellphone, there is a heavy tax on traditional keyboard based information entry, and we believe it can be significantly more convenient to communicate by voice. Our work focuses on making this communication more reliable, and able to cover the full range of information needed in daily life.

Acoustic Modeling

Established: January 29, 2004

Acoustic modeling of speech typically refers to the process of establishing statistical representations for the feature vector sequences computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models (including hidden dynamic models), neural networks, maximum entropy models, and (hidden) conditional random fields, etc. Acoustic modeling also encompasses "pronunciation modeling", which describes how a sequence or multi-sequences of fundamental speech units (such as phones or…


Established: February 19, 2002

Your Pad or MiPad It only took one scientist mumbling at a monitor to give birth to the idea that a computer should be able to listen, understand, and even talk back. But years of effort haven't gotten us closer to the Jetson dream: a computer that listens better than your spouse, better than your boss, and even better than your dog Spot. Using state-of-the-art speech recognition, and strengthening this new science with pen input,…

Noise Robust Speech Recognition

Established: February 19, 2002

Techniques to improve the robustness of automatic speech recognition systems to noise and channel mismatches Robustness of ASR Technology to Background Noise You have probably seen that most people using a speech dictation software are wearing a close-talking microphone. So, why has senior researcher Li Deng been trying to get rid of close-talking microphones? Close-talking microphones pick up relatively little background noise and speech recognition systems can obtain decent accuracy with them. If you are…

Whistler Text-to-Speech Engine

Established: November 5, 2001

The talking computer HAL in the 1968 film "2001-A Space Odyssey" had an almost human voice, but it was the voice of an actor, not a computer. Getting a real computer to talk like HAL has proven one of the toughest problems posed by "2001." Microsoft's contribution to this field is "Whistler" (Windows Highly Intelligent STochastic taLkER), a trainable text-to-speech engine which was released in 1998 as part of the SAPI4.0 SDK, and then as…






An Introduction to Computational Networks and the Computational Network Toolkit
Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Oleksii Kuchaiev, Yu Zhang, Frank Seide, Zhiheng Huang, Brian Guenter, Huaming Wang, Jasha Droppo, Geoffrey Zweig, Chris Rossbach, Jie Gao, Andreas Stolcke, Jon Currey, Malcolm Slaney, Guoguo Chen, Amit Agarwal, Chris Basoglu, Marko Padmilac, Alexey Kamenev, Vladimir Ivanov, Scott Cypher, Hari Parthasarathi, Bhaskar Mitra, Baolin Peng, Xuedong Huang, Microsoft Research, October 1, 2014, View abstract, Download PDF













MIPAD: A Multimodal Interactive Prototype
Xuedong Huang, Alex Acero, C. Chelba, Li Deng, Jasha Droppo, D. Duchene, J. Goodman, Hsiao-Wuen Hon, D. Jacoby, L. Jiang, Ricky Loynd, Milind Mahajan, P. Mau, S. Meredith, S. Mughal, S. Neto, M. Plumpe, K. Steury, Gina Venolia, Kuansan Wang, Ye-Yi Wang, in International Conference on Acoustics, Speech, and Signal Processing, Institute of Electrical and Electronics Engineers, Inc., January 1, 2001, View abstract, Download PDF