Portrait of Li Deng

Li Deng

Partner Research Manager


Li Deng (IEEE M’89;SM’92;F’04) received the Bachelor degree from Univ. Science & Technology of China (USTC; Guo Mo-Ruo Awardee), and Master and Ph.D. degrees from the University of Wisconsin-Madison, US. He was an assistant professor (1989-1992), tenured associate professor (1992-1996) and Full Professor (1996-1999) at the University of Waterloo, Ontario, Canada. In 1999, he joined Microsoft Research, Redmond, USA, where currently he leads R&D of application-focused deep learning as a Partner Research Manager of its Deep Learning Technology Center. Since January 2016, he has also taken new responsibilities in the company as  Chief Scientist of AI in Microsoft’s Applications and Service Group (ASG). Since 2000, he has been Affiliate Full Professor and graduate committee member at the University of Washington, Seattle.

Prior to joining Microsoft, he also conducted research and taught at Massachusetts Institute of Technology, ATR Interpreting Telecommunications Research Lab. (Kyoto, Japan), and HKUST. He has been granted over 70 US or international patents in acoustics/audio, speech/language technology, large-scale natural language and enterprise/internet data analysis, and in machine learning with recent focus on deep learning. He received numerous awards/honors bestowed by IEEE, International Speech Communication Association, Acoustical Society of America, Asia-Pacific Signal & Information Processing Association, Microsoft, and other organizations.

His current (and past) research activities include deep learning and machine intelligence applied to big data and to speech, text, image and multimodal processing, enterprise data analytics, computational neuroscience and information representation, deep/recurrent/dynamic neural networks, automatic speech and speaker recognition, spoken language identification and understanding, reading comprehension, dialogue systems, speech-to-speech translation, machine translation, language modeling, information retrieval, data mining, web search, neural information processing, dynamic systems, machine learning and optimization, parallel and distributed computing, probabilistic graphical models, audio and acoustic signal processing, image analysis and recognition, compressive sensing, statistical signal processing, digital communication, human speech production and perception, acoustic phonetics, auditory speech processing, auditory physiology and modeling, noise robust speech processing, speech synthesis and enhancement, multimedia signal processing, and multimodal human-computer interactions.

In the general areas of audio/speech/language technology and science, AI, machine learning, signal/information processing, and other areas of computer science, he has published over 300 refereed papers in leading journals and conferences, and authored or co-authored 5 books including the latest books of Deep Learning: Methods and Applications and on Automatic Speech Recognition: A Deep-Learning Approach (Springer). He is a Fellow of the Acoustical Society of America, a Fellow of the IEEE, and a Fellow of the International Speech Communication Association. He served on the Board of Governors of the IEEE Signal Processing Society (2008-2010), and as Editor-in-Chief for the IEEE Signal Processing Magazine (2009-2011), which earned the highest impact factor in 2010 and 2011 among all IEEE publications and for which he received the 2012 IEEE SPS Meritorious Service Award. Most recently, he served as General Chair of the IEEE ICASSP-2013, and as Editor-in-Chief for the IEEE Transactions on Audio, Speech and Language Processing (2012-2014). His technical work since 2009 (when he initiated deep learning research and technology development at Microsoft with Geoff Hinton) and the leadership in industry-scale deep learning with colleagues have created high impact in speech recognition and other areas of information processing. The work by him and the team he manages has been in use in major Microsoft speech and text/data-related products, and is recognized by IEEE SPS Technical Achievement Award, IEEE SPS Best Paper Awards, IEEE Outstanding Engineer Award, APSIPA Industrial Distinguished Leader Award, Microsoft Goldstar and Technology Transfer Awards.


JointSLU: Joint Semantic Frame Parsing for Spoken Language Understanding

Sequence-to-sequence deep learning has recently emerged as a new paradigm in supervised learning for spoken language understanding. However, most of the previous studies explored this framework for building single domain models for each task, such as slot filling or domain classification, comparing deep learning based approaches with conventional ones like conditional random fields. This project focuses on a holistic multi-domain, multi-task (i.e. slot filling, domain and intent detection) modeling approach to estimate complete semantic frames…

From Captions to Visual Concepts and Back

Established: April 9, 2015

We introduce a novel approach for automatically generating image descriptions. Visual detectors, language models, and deep multimodal similarity models are learned directly from a dataset of image captions. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. Human judges consider the captions to be as good as or better than humans 34% of the time.

Spoken Language Understanding

Established: May 1, 2013

Spoken language understanding (SLU) is an emerging field in between the areas of speech processing and natural language processing. The term spoken language understanding has largely been coined for targeted understanding of human speech directed at machines. This project covers our research on SLU tasks such as domain detection, intent determination, and slot filling, using data-driven methods. Projects Deeper Understanding: Moving beyond shallow targeted understanding towards building domain independent SLU models. Scaling SLU: Quickly bootstrapping SLU…

Understand User’s Intent from Speech and Text

Established: December 17, 2008

Understanding what users like to do/need to get is critical in human computer interaction. When natural user interface like speech or natural language is used in human-computer interaction, such as in a spoken dialogue system or with an internet search engine, language understanding becomes an important issue. Intent understanding is about identifying the action a user wants a computer to take or the information she/he would like to obtain, conveyed in a spoken utterance or…

Voice Search: Say What You Want and Get It

Established: December 15, 2008

In the Voice Search project, we envision a future where you can ask your cellphone for any kind of information and get it. With a small cellphone, there is a heavy tax on traditional keyboard based information entry, and we believe it can be significantly more convenient to communicate by voice. Our work focuses on making this communication more reliable, and able to cover the full range of information needed in daily life.

Acoustic Modeling

Established: January 29, 2004

Acoustic modeling of speech typically refers to the process of establishing statistical representations for the feature vector sequences computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models (including hidden dynamic models), neural networks, maximum entropy models, and (hidden) conditional random fields, etc. Acoustic modeling also encompasses "pronunciation modeling", which describes how a sequence or multi-sequences of fundamental speech units (such as phones or…

Who Is Talking To You (WITTY)

Established: August 9, 2003

Mission Statement Exploit multi-sensory information to improve user experience in  •  Speech-centric human computer interaction •  Computer-mediated human inter-communication Goals Understand end-users' requirements Identify sensor(s) requirement Prototype new hardware Develop robust technologies Publications Publication 1: Air-and-Bone Conductive Integrated Microphones for Robust Speech Detection and Enhancement in the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU03), November 30 - December 4, 2003. Publication 2: Multi-Sensory Microphones for Robust Speech Detection, Enhancement and Recognition in the IEEE…


Established: February 19, 2002

Your Pad or MiPad It only took one scientist mumbling at a monitor to give birth to the idea that a computer should be able to listen, understand, and even talk back. But years of effort haven't gotten us closer to the Jetson dream: a computer that listens better than your spouse, better than your boss, and even better than your dog Spot. Using state-of-the-art speech recognition, and strengthening this new science with pen input,…

Noise Robust Speech Recognition

Established: February 19, 2002

Techniques to improve the robustness of automatic speech recognition systems to noise and channel mismatches Robustness of ASR Technology to Background Noise You have probably seen that most people using a speech dictation software are wearing a close-talking microphone. So, why has senior researcher Li Deng been trying to get rid of close-talking microphones? Close-talking microphones pick up relatively little background noise and speech recognition systems can obtain decent accuracy with them. If you are…

Whistler Text-to-Speech Engine

Established: November 5, 2001

The talking computer HAL in the 1968 film "2001-A Space Odyssey" had an almost human voice, but it was the voice of an actor, not a computer. Getting a real computer to talk like HAL has proven one of the toughest problems posed by "2001." Microsoft's contribution to this field is "Whistler" (Windows Highly Intelligent STochastic taLkER), a trainable text-to-speech engine which was released in 1998 as part of the SAPI4.0 SDK, and then as…





From Captions to Visual Concepts and Back
Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John Platt, Larry Zitnick, Geoffrey Zweig, in The proceedings of CVPR, IEEE – Institute of Electrical and Electronics Engineers, June 1, 2015, View abstract, Download PDF















MIPAD: A Multimodal Interactive Prototype
Xuedong Huang, Alex Acero, C. Chelba, Li Deng, Jasha Droppo, D. Duchene, J. Goodman, Hsiao-Wuen Hon, D. Jacoby, L. Jiang, Ricky Loynd, Milind Mahajan, P. Mau, S. Meredith, S. Mughal, S. Neto, M. Plumpe, K. Steury, Gina Venolia, Kuansan Wang, Ye-Yi Wang, in International Conference on Acoustics, Speech, and Signal Processing, Institute of Electrical and Electronics Engineers, Inc., January 1, 2001, View abstract, Download PDF


MiPad: A Next Generation PDA Prototype
Xuedong Huang, Alex Acero, C. Chelba, Li Deng, Doug Duchene, J. Goodman, Hsiao-Wuen Hon, D. Jacoby, Li Jiang, Ricky Loynd, Milind Mahajan, P. Mau, S. Meredith, Salman Mughal, S. Neto, M. Plumpe, Kuansan Wang, Ye-Yi Wang, in International Conference on Spoken Language Processing, International Speech Communication Association, January 1, 2000, View abstract, Download PDF

















Link description

Deep Learning for Text Processing


August 4, 2014


Li Deng, Eric Xing, Xiaodong He, Jianfeng Gao, Christopher Manning, Paul Smolensky, and Jeff A Bilmes


MSR, Carnegie Mellon University, Microsoft Research, Redmond, MSR Redmond, Stanford, Johns Hopkins University, University of Washington


Professional Activities and Honors/Awards


Patents (Awarded)

  • Deep structured semantic model produced using click-through data, U.S. Patent #9,519,859 granted on 12/13/2016
  • Convolutional latent semantic models and their applications, U.S. Patent #9,477,654 granted on 10/25/2016
  • Computer-Implemented Deep Tensor Neural Network, U.S. Patent #9,292,787 granted on 3/22/2016
  • Discriminative pre-training of deep neural networks, U.S. Patent #9,235,799 granted on 1/12/2016
  • Tensor Deep Stacking Network, U.S. Patent #9,165,243, granted on October 20, 2015
  • Kernel deep convex networks and end-to-end learning, U.S. Patent #9,099,083, granted on August 4, 2015
  • Confidence calibration in automatic speech recognition systems, U.S. Patent #9,070,360, granted on June 30, 2015
  • Full-sequence training of deep structures for speech recognition, U.S. Patent #9,031,844, granted on May 12, 2015
  • Deep belief networks for large vocabulary continuous speech recognition, U.S. Patent #8,972,253, granted on March 3, 2015
  • Learning Processes For Single Hidden Layer Neural Networks With Linear Output Units, US Patent #8,918,352, granted on 12/23/2014
  • Exploiting Sparseness in Training Deep Neural Networks, filed 11/28/2011, US Patent #8700552, granted on 4/15/2014.
  • Online Distorted Speech Estimation Within An Unscented Transformation Framework. filed on 11/18/2010, US Patent #8731916, granted on 5/20/2014.
  • Deep Convex Network With Joint Use Of Nonlinear Random Projection, Restricted Boltzmann Machine And Batch-Based Parallelizable Optimization, filed 3/31/2011, US Patent #8489529, granted on 7/16/2013.
  • Deep structured conditional random fields for sequence labeling and classification, U.S. Patent; filed: 1/29/2010; granted on 6/25/2013, Patent #8,473,430.
  • Automatic reading feedback with parallel polarized language modeling,” (US Patent #8,433,576, granted on 4/30/2013
  • Generic framework for large-margin MCE training in speech recognition,” (US Patent #8,423,364, granted on 4/16/2013
  • Integrative and discriminative technique for spoken utterance translation (US Patent #8,407,041, granted on 3/26/2013
  • Speech recognition with non-liner noise reduction on Mel-frequency cepstra, (US Patent #8,306,817, granted Nov. 6, 2012)
  • Automatic Reading Tutoring, U.S. Patent; (US Patent #8,306,822, granted Nov. 6, 2012)
  • Adapting A Compressed Model For Use In Speech Recognition,” U.S. Patent, (#8,239,195, granted August 3, 2012)
  • Phase Sensitive Model Adaptation For Noisy Speech Recognition,” U.S. Patent, (#8,214,215, granted July 3, 2012)
  • Minimum classification error training with growth transformation optimization,” (U.S. Patent #8,301,449, granted Oct. 30, 2012)
  • Speech-centric multimodal user interface design in mobile technology,”  (US Patent #8,219,406, granted July 10, 2012)
  • High performance HMM adaptation with joint compensation of additive and convolutive distortions,” (US Patent #8,180,637, granted May. 15, 2012)
  • Piecewise-Based Variable-Parameter Hidden Markov Models and the Training Thereof,” (US Patent #8,160,878, granted April 17 2012)
  • Noise Suppressor for Robust Speech Recognition,” (US Patent #8,185,389, granted May. 22, 2012)
  • Parameter Clustering and Sharing for Variable-Parameter Hidden Markov Models, (US Patent #8,145,488, granted March 27, 2012)
  • Parameter Learning in Hidden Trajectory Model, (U.S. Patent #8,010,356, granted August 30, 2011)
  • Time Synchronous Decoding for Long-Span Hidden Trajectory Model, (US patent #7,877,256, granted 2011)
  • Integrated Speech Recognition and Semantic Classification (granted 2011, US patent #7,856,351)
  • Hidden Trajectory Modeling with Differential Cepstra for Speech Recognition, (granted 2010, US patent #7,805,308)
  • Segment-Discriminating Minimum Classification Error Pattern Recognition, with X. He and Q. Fu (granted Jan 18, 2011, US patent #7,873,209)
  • Hidden trajectory modeling with differential cepstra for speech recognition, U.S. Patent No.: 7,805,308; granted on September 28, 2010
  • Time Asynchronous Decoding for Long-Span Trajectory Model,” US patent No.: 7,734,460, granted on June 8, 2010
  • Method and Apparatus for Constructing a Speech Filter Using Estimates of Clean Speech and Noise,” U.S. Patent No.: 7,725,314; granted on May 25, 2010
  • Learning Statistically Characterized Resonance Targets in a Hidden Trajectory Model,  US patent #7653535, granted January 2010.
  • Incrementally Regulated Discriminative Margins in MCE Training for Speech Recognition, US patent #7617103, granted Sept 2009
  • Quantitative model for formant dynamics and contextually assimilated reduction in fluent speech, US patent No.: 7,565,292, granted on July 21, 2009
  • Acoustic models with structured hidden dynamics with integration over many possible hidden trajectories, US patent No.: 7,565,284, granted on July 21, 2009
  • Speaker-adaptive Learning of Resonance Targets in a Hidden Trajectory Model of Speech Coarticulation, US patent No.: 7,519,531, granted on April 14, 2009
  • Greedy algorithm for identifying values for vocal tract resonance vectors, U.S. Patent No.: 7,475,011; Granted on January 6, 2009
  • Method of Speech Recognition Using Multimodal Variational Inference with Switching State Space Models, U.S. Patent No.: 7,480,615; Granted on January 20, 2009
  • Method of Speech Recognition Using Variables Representing Dynamic Aspects of Speech, U.S. Patent No.: 7,346,510; Granted on March 18, 2008
  • Method of Noise Reduction Using Instantaneous Signal-to-Noise Ratio as the Principal Quantity for Optimal Estimation, U.S. Patent No.: 7,363,221; Granted on April 22, 2008
  • Method and Apparatus for Formant Tracking Using a Residual Model, U.S. Patent No.: 7,424,423; Granted on September 9, 2008
  • Multi-Sensory Speech Enhancement Using Synthesized Sensory Signal, U.S. Patent No.: 7,406,303; Granted on July 29, 2008
  • Two-stage implementation for phonetic recognition using a bi-directional target-directed model of speech co-articulation and reduction, U.S. Patent No.: 7,409,346; Granted on August 5, 2008
  • Removing noise from feature vectors, U.S. Patent No.: 7,310,599; Granted on December 18, 2007;
  • Method of determining uncertainty associated with acoustic distortion-based noise reduction, U.S. Patent No. 7,289,955; Granted on October 30, 2007
  • Method and apparatus for identifying noise environments from noisy signals, U.S. Patent No. 7,266,494; Granted on September 4, 2007
  • Method of noisy reduction using correction and scaling vectors with partitioning of the acoustic space in the domain of noisy speech, U.S. Patent No.7,254,536; Granted on August 7, 2007
  • Method of determining uncertainty in noise reduction, US and International Patents; U.S. Patent No.: 7,174,292; Granted on Feb. 6, 2007
  • Method of Noise Estimation Using Incremental Bayes Learning, US. Patent; Patent No.: 7,165,026; Granted on Jan. 16, 2007
  • Method of iterative noise estimation in a recursive framework, U.S. Patent; Patent No. 7,139,703; Granted on Nov. 21, 2006.
  • Method of noise reduction using correction vectors based on dynamic aspects of speech and noise normalization, United States Patent No. 7,117,148; Granted on October 3, 2006.
  • Method of noise reduction based on dynamic aspects of speech, United States Patent No. 7,107,210; Granted on Sept 12, 2006.
  • Method of pattern recognition using noise reduction uncertainty, United States Patent No. 7,103,540; Granted on Sept 5, 2006.
  • Microphone array signal enhancement using mixture models (jointly with Haggai Attias), United States Patent No. 7,103,541; Granted on Sept 5, 2006.
  • Efficient backward recursion for computing posterior probabilities, United States Patent No. 7,062,407; Granted on June 13, 2006.
  • Method of speech recognition using time-dependent interpolation and hidden dynamics, United States (and International) Patent No. 7,050,975; Granted on May 23, 2006.
  • Nonlinear observation models for removing noise from corrupted speech, United States (and International) Patent No. 7,047,047; Granted on May 16, 2006.
  • Method of Noise Reduction Using Correction and Scaling Vectors with Partitioning of the Acoustic Space in the Domain of Noisy Speech, United States Patent No. 7,003,455; Granted on February 21, 2006
  • Methods and Apparatus for Denoising and Dereverberation Using Variational Inference and Strong Speech Models, United States Patent No. 6,990,447; Granted on January 24, 2006
  • Method and Apparatus for Removing Noise from Feature Vectors, United States Patent No. 6,985,858; Granted on January 10, 2006
  • Methods for Including the Category of Environmental Noise When Processing Speech Signals, United States Patent No. 6,959,276; Granted on October 25, 2005
  • Method of iterative noise estimation in a recursive framework, United States Patent; Patent No. 6,944,590; Granted on September 13, 2005
  • Method of speech recognition using variational inference with switching state space models, United States Patent; Patent No. 6,931,374; Granted on August 16, 2005
  • Pattern Recognition Training Method and Apparatus Using Inserted Noise Followed by Noise Reduction, United States (and International) Patent; Patent No. 6,876,966; Granted on April 5, 2005
  • Apparatus for Speaker Clustering and for Speech Recognition, Patent No.: 2,965,537; Granted on Aug. 13, 1999; Countries of issue: United States and Japan.
  • Apparatus for Speaker Normalization Processor and for Voice Recognition Device, Patent No.: 2986792; Granted on Oct. 1, 1999; Countries of issue: United States and Japan.

Patents (Pending awards)

  • Method of speech recognition using hidden trajectory hidden Markov models, U.S. Patent
  • Zero-variance model of acoustic environment for enhancing noisy speech features,” U.S. Patent
  • Method and Apparatus for Multi-Sensory Speech Enhancement,” International Patent;
  • Method and apparatus for continuous valued vocal tract resonance tracking using piecewise linear approximation
  • Speech resonance target estimation using formant tracking results, U.S. Patent
  • Incrementally regulating discriminative margins in MCE training for speech recognition,” U.S. Patent; filing date: 8/25/2006
  • Using a discretized, higher order representation of hidden dynamic variables for speech recognition,” U.S. Patent; filing date: 8/21/2006
  • Integrated speech recognition and semantic classification,” U.S. Patent; filing date: 1/19/2007
  • Segment-discriminating minimum classification error pattern recognition,” U.S. Patent; filing date: 1/31/2007
  • Maximum Entropy Model with Continuous Features, U.S. Patent; filing date: April 2009
  • Cross-lingual speech recognition with HMM using KL distance,” U.S. Patent; filing date: April 2009
  • Maximum entropy model with continuous features, U.S. Patent; filing date: 4/1/2009
  • Discriminative learning of feature functions of generative type in speech translation, filed 10/28/2011
  • Discriminative pretraining of deep neural networks, filed 11/26/2011
  • Tensor Deep Stacking Networks, filed 2/15/2012.
  • Multilingual Deep Neural Network, filed 3/11/2013
  • Assignment of semantic labels to a sequence of words using neural network architectures, filed 9/2/2013
  • Deep structured semantic model produced using click-through data. filed 9/6/2013.
  • Convolutional Latent Semantic Models and Their Applications. filed 4/1/2014
  • Context-Sensitive Search Using a Deep Learning Model, filed 4/14/2014
  • Modeling Interestingness with Deep Neural Networks, filed 6/13/2014
  • Training and operations of computational models, US patent filed 6/29/2015
  • Leveraging global data for enterprise data analytics, US patent filed 7/24/2015
  • Representing learning using multi-task deep neural networks, US patent filed 7/28/2015
  • Semantically-relevant discovery of solutions, US patent filed 8/28/2015
  • Discovery of semantic similarities between images and text, US patent filed 8/28/2015
  • Multi-modal controller, US patent filed 12/30/2015
  • Multi-Stage Image Querying, filed with the U.S. Patent and Trademark Office on 4/12/2016.
  • Multiple action computational model training and operation, US patent filed 3/29/2016
  • Computational-model operation using multiple subject representations, US patent filed 3/29/2016
  • End-to-end memory networks for contextual language understanding, US patent filed 8/4/2016
  • Multi-domain joint semantic frame parsing, US patent filed 8/4/2016
  • Knowledge-guided structural attention processing, US patent filed 9/7/2016


News about My & Collaborators’ Recent Technical Work, Deep Learning, Lecture Material/Videos, Presentation Slides, etc.