I have been with Microsoft Research since July, 2000. My primary task has been to explore different techniques to make ASR more robust to additive and channel noise. Other projects I’ve worked on include general speech signal enhancement, pitch tracking, multiple stream ASR, novel speech recognition features, MiPad multimodal interface, cepstral compression and transport, and the WITTY microphone.

The SPLICE project was successful in building a more robust speech recognition system. Working jointly with Alex Acero and Li Deng, we were able to get amazing results on the noisy Aurora2 corpus. The model-based feature enhancement project was meant to address the stereo-data requirement for SPLICE. The model describes how speech and noise (and noisy channels) interact to corrupt the speech features. It can be used to either enhance the features before recognition, or to adapt the recognizer’s model at run-time.

Lately, I’ve moved to more general non-parametric warpings of the feature space. We’re trying to learn transformations that improve recognition performance in both noisy and clean conditions.

I earned my Ph.D. in Electrical Engineering at the University of Washington’s Interactive Systems Design Laboratory in June of 2000. Early in my studies, I helped to develop a discrete theory for time-frequency representations of non-stationary audio signals. The application of this theory to speech recognition was the core of my thesis, “Time-Frequency Representations for Speech Recognition.” Other projects I worked on during this time included a GMM-based speaker verification system, subliminal audio message encoding, and non-linear signal morphing.

My MSEE was also earned at the University of Washington, in 1996. I earned my BSEE from Gonzaga University in Spokane, in 1994. My final project consisted of building a control system for a high speed dot-matrix printer.  I wrote a paper comparing and contrasting the behavior of fuzzy controllers to linear controllers, and received first prize in the region’s IEEE paper contest.


Acoustic Modeling

Established: January 29, 2004

Acoustic modeling of speech typically refers to the process of establishing statistical representations for the feature vector sequences computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models…

Whistler Text-to-Speech Engine

Established: November 5, 2001

The talking computer HAL in the 1968 film "2001-A Space Odyssey" had an almost human voice, but it was the voice of an actor, not a computer. Getting a real computer to talk like HAL has proven one of the…
















MIPAD: A Multimodal Interactive Prototype
