Portrait of Li Deng

Li Deng

Partner Research Manager

About

Li Deng (IEEE M’89;SM’92;F’04) received the Bachelor degree from Univ. Science & Technology of China (USTC; Guo Mo-Ruo Awardee), and Master and Ph.D. degrees from the University of Wisconsin-Madison, US. He was an assistant professor (1989-1992), tenured associate professor (1992-1996) and Full Professor (1996-1999) at the University of Waterloo, Ontario, Canada. In 1999, he joined Microsoft Research, Redmond, USA, where currently he leads R&D of application-focused deep learning as a Partner Research Manager of its Deep Learning Technology Center. Since January 2016, he has also taken new responsibilities in the company as  Chief Scientist of AI in Microsoft’s Applications and Service Group (ASG). Since 2000, he has been Affiliate Full Professor and graduate committee member at the University of Washington, Seattle.

Prior to joining Microsoft, he also conducted research and taught at Massachusetts Institute of Technology, ATR Interpreting Telecommunications Research Lab. (Kyoto, Japan), and HKUST. He has been granted over 70 US or international patents in acoustics/audio, speech/language technology, large-scale natural language and enterprise/internet data analysis, and in machine learning with recent focus on deep learning. He received numerous awards/honors bestowed by IEEE, International Speech Communication Association, Acoustical Society of America, Asia-Pacific Signal & Information Processing Association, Microsoft, and other organizations.

His current (and past) research activities include deep learning and machine intelligence applied to big data and to speech, text, image and multimodal processing, enterprise data analytics, computational neuroscience and information representation, deep/recurrent/dynamic neural networks, automatic speech and speaker recognition, spoken language identification and understanding, reading comprehension, dialogue systems, speech-to-speech translation, machine translation, language modeling, information retrieval, data mining, web search, neural information processing, dynamic systems, machine learning and optimization, parallel and distributed computing, probabilistic graphical models, audio and acoustic signal processing, image analysis and recognition, compressive sensing, statistical signal processing, digital communication, human speech production and perception, acoustic phonetics, auditory speech processing, auditory physiology and modeling, noise robust speech processing, speech synthesis and enhancement, multimedia signal processing, and multimodal human-computer interactions.

In the general areas of audio/speech/language technology and science, AI, machine learning, signal/information processing, and other areas of computer science, he has published over 300 refereed papers in leading journals and conferences, and authored or co-authored 5 books including the latest books of Deep Learning: Methods and Applications and on Automatic Speech Recognition: A Deep-Learning Approach (Springer). He is a Fellow of the Acoustical Society of America, a Fellow of the IEEE, and a Fellow of the International Speech Communication Association. He served on the Board of Governors of the IEEE Signal Processing Society (2008-2010), and as Editor-in-Chief for the IEEE Signal Processing Magazine (2009-2011), which earned the highest impact factor in 2010 and 2011 among all IEEE publications and for which he received the 2012 IEEE SPS Meritorious Service Award. Most recently, he served as General Chair of the IEEE ICASSP-2013, and as Editor-in-Chief for the IEEE Transactions on Audio, Speech and Language Processing (2012-2014). His technical work since 2009 (when he initiated deep learning research and technology development at Microsoft with Geoff Hinton) and the leadership in industry-scale deep learning with colleagues have created high impact in speech recognition and other areas of information processing. The work by him and the team he manages has been in use in major Microsoft speech and text/data-related products, and is recognized by IEEE SPS Technical Achievement Award, IEEE SPS Best Paper Awards, IEEE Outstanding Engineer Award, APSIPA Industrial Distinguished Leader Award, Microsoft Goldstar and Technology Transfer Awards.

 

Projects

From Captions to Visual Concepts and Back

Established: April 9, 2015

We introduce a novel approach for automatically generating image descriptions. Visual detectors, language models, and deep multimodal similarity models are learned directly from a dataset of image captions. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a…

Acoustic Modeling

Established: January 29, 2004

Acoustic modeling of speech typically refers to the process of establishing statistical representations for the feature vector sequences computed from the speech waveform. Hidden Markov Model (HMM) is one most common type of acoustuc models. Other acosutic models include segmental models, super-segmental models…

Whistler Text-to-Speech Engine

Established: November 5, 2001

The talking computer HAL in the 1968 film "2001-A Space Odyssey" had an almost human voice, but it was the voice of an actor, not a computer. Getting a real computer to talk like HAL has proven one of the…

Publications

2016

2015

From Captions to Visual Concepts and Back
Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John Platt, Larry Zitnick, Geoffrey Zweig, in The proceedings of CVPR, IEEE – Institute of Electrical and Electronics Engineers, June 1, 2015, View abstract, Download PDF

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

MIPAD: A Multimodal Interactive Prototype
Xuedong Huang, Alex Acero, C. Chelba, Li Deng, Jasha Droppo, D. Duchene, J. Goodman, Hsiao-Wuen Hon, D. Jacoby, L. Jiang, Ricky Loynd, Milind Mahajan, P. Mau, S. Meredith, S. Mughal, S. Neto, M. Plumpe, K. Steury, Gina Venolia, Kuansan Wang, Ye-Yi Wang, in International Conference on Acoustics, Speech, and Signal Processing, Institute of Electrical and Electronics Engineers, Inc., January 1, 2001, View abstract, Download PDF

2000

MiPad: A Next Generation PDA Prototype
Xuedong Huang, Alex Acero, C. Chelba, Li Deng, Doug Duchene, J. Goodman, Hsiao-Wuen Hon, D. Jacoby, Li Jiang, Ricky Loynd, Milind Mahajan, P. Mau, S. Meredith, Salman Mughal, S. Neto, M. Plumpe, Kuansan Wang, Ye-Yi Wang, in International Conference on Spoken Language Processing, International Speech Communication Association, January 1, 2000, View abstract, Download PDF

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

1989

1988

1987

1986

1985

Projects

Deep Learning for Text Processing Link description

Deep Learning for Text Processing

Date

August 4, 2014

Speakers

Li Deng, Eric Xing, Xiaodong He, Jianfeng Gao, Christopher Manning, Paul Smolensky, and Jeff A Bilmes

Affiliation

MSR, Carnegie Mellon University, Microsoft Research, Redmond, MSR Redmond, Stanford, Johns Hopkins University, University of Washington

Other

Professional Activities and Honors/Awards

News

Patents (awarded and Pending)

  • Patents (awarded)

  • Computer-Implemented Deep Tensor Neural Network, U.S. Patent #9,292,787 granted on 3/22/2016
  • Tensor Deep Stacking Network, U.S. Patent #9,165,243, granted on October 20, 2015
  • Kernel deep convex networks and end-to-end learning, U.S. Patent #9,099,083, granted on August 4, 2015
  • Confidence calibration in automatic speech recognition systems, U.S. Patent #9,070,360, granted on June 30, 2015
  • Full-sequence training of deep structures for speech recognition, U.S. Patent #9,031,844, granted on May 12, 2015
  • Deep belief networks for large vocabulary continuous speech recognition, U.S. Patent #8,972,253, granted on March 3, 2015
  • Learning Processes For Single Hidden Layer Neural Networks With Linear Output Units, US Patent #8,918,352, granted on 12/23/2014
  • Exploiting Sparseness in Training Deep Neural Networks, filed 11/28/2011, US Patent #8700552, granted on 4/15/2014.
  • Online Distorted Speech Estimation Within An Unscented Transformation Framework. filed on 11/18/2010, US Patent #8731916, granted on 5/20/2014.
  • Deep Convex Network With Joint Use Of Nonlinear Random Projection, Restricted Boltzmann Machine And Batch-Based Parallelizable Optimization, filed 3/31/2011, US Patent #8489529, granted on 7/16/2013.
  • Deep structured conditional random fields for sequence labeling and classification, U.S. Patent; filed: 1/29/2010; granted on 6/25/2013, Patent #8,473,430.
  • Automatic reading feedback with parallel polarized language modeling,” (US Patent #8,433,576, granted on 4/30/2013
  • Generic framework for large-margin MCE training in speech recognition,” (US Patent #8,423,364, granted on 4/16/2013
  • Integrative and discriminative technique for spoken utterance translation (US Patent #8,407,041, granted on 3/26/2013
  • Speech recognition with non-liner noise reduction on Mel-frequency cepstra, (US Patent #8,306,817, granted Nov. 6, 2012)
  • Automatic Reading Tutoring, U.S. Patent; (US Patent #8,306,822, granted Nov. 6, 2012)
  • Adapting A Compressed Model For Use In Speech Recognition,” U.S. Patent, (#8,239,195, granted August 3, 2012)
  • Phase Sensitive Model Adaptation For Noisy Speech Recognition,” U.S. Patent, (#8,214,215, granted July 3, 2012)
  • Minimum classification error training with growth transformation optimization,” (U.S. Patent #8,301,449, granted Oct. 30, 2012)
  • Speech-centric multimodal user interface design in mobile technology,”  (US Patent #8,219,406, granted July 10, 2012)
  • High performance HMM adaptation with joint compensation of additive and convolutive distortions,” (US Patent #8,180,637, granted May. 15, 2012)
  • Piecewise-Based Variable-Parameter Hidden Markov Models and the Training Thereof,” (US Patent #8,160,878, granted April 17 2012)
  • Noise Suppressor for Robust Speech Recognition,” (US Patent #8,185,389, granted May. 22, 2012)
  • Parameter Clustering and Sharing for Variable-Parameter Hidden Markov Models, (US Patent #8,145,488, granted March 27, 2012)
  • Parameter Learning in Hidden Trajectory Model, (U.S. Patent #8,010,356, granted August 30, 2011)
  • Time Synchronous Decoding for Long-Span Hidden Trajectory Model, (US patent #7,877,256, granted 2011)
  • Integrated Speech Recognition and Semantic Classification (granted 2011, US patent #7,856,351)
  • Hidden Trajectory Modeling with Differential Cepstra for Speech Recognition, (granted 2010, US patent #7,805,308)
  • Segment-Discriminating Minimum Classification Error Pattern Recognition, with X. He and Q. Fu (granted Jan 18, 2011, US patent #7,873,209)
  • Hidden trajectory modeling with differential cepstra for speech recognition, U.S. Patent No.: 7,805,308; granted on September 28, 2010
  • Time Asynchronous Decoding for Long-Span Trajectory Model,” US patent No.: 7,734,460, granted on June 8, 2010
  • Method and Apparatus for Constructing a Speech Filter Using Estimates of Clean Speech and Noise,” U.S. Patent No.: 7,725,314; granted on May 25, 2010
  • Learning Statistically Characterized Resonance Targets in a Hidden Trajectory Model,  US patent #7653535, granted January 2010.
  • Incrementally Regulated Discriminative Margins in MCE Training for Speech Recognition, US patent #7617103, granted Sept 2009
  • Quantitative model for formant dynamics and contextually assimilated reduction in fluent speech, US patent No.: 7,565,292, granted on July 21, 2009
  • Acoustic models with structured hidden dynamics with integration over many possible hidden trajectories, US patent No.: 7,565,284, granted on July 21, 2009
  • Speaker-adaptive Learning of Resonance Targets in a Hidden Trajectory Model of Speech Coarticulation, US patent No.: 7,519,531, granted on April 14, 2009
  • Greedy algorithm for identifying values for vocal tract resonance vectors, U.S. Patent No.: 7,475,011; Granted on January 6, 2009
  • Method of Speech Recognition Using Multimodal Variational Inference with Switching State Space Models, U.S. Patent No.: 7,480,615; Granted on January 20, 2009
  • Method of Speech Recognition Using Variables Representing Dynamic Aspects of Speech, U.S. Patent No.: 7,346,510; Granted on March 18, 2008
  • Method of Noise Reduction Using Instantaneous Signal-to-Noise Ratio as the Principal Quantity for Optimal Estimation, U.S. Patent No.: 7,363,221; Granted on April 22, 2008
  • Method and Apparatus for Formant Tracking Using a Residual Model, U.S. Patent No.: 7,424,423; Granted on September 9, 2008
  • Multi-Sensory Speech Enhancement Using Synthesized Sensory Signal, U.S. Patent No.: 7,406,303; Granted on July 29, 2008
  • Two-stage implementation for phonetic recognition using a bi-directional target-directed model of speech co-articulation and reduction, U.S. Patent No.: 7,409,346; Granted on August 5, 2008
  • Removing noise from feature vectors, U.S. Patent No.: 7,310,599; Granted on December 18, 2007;
  • Method of determining uncertainty associated with acoustic distortion-based noise reduction, U.S. Patent No. 7,289,955; Granted on October 30, 2007
  • Method and apparatus for identifying noise environments from noisy signals, U.S. Patent No. 7,266,494; Granted on September 4, 2007
  • Method of noisy reduction using correction and scaling vectors with partitioning of the acoustic space in the domain of noisy speech, U.S. Patent No.7,254,536; Granted on August 7, 2007
  • Method of determining uncertainty in noise reduction, US and International Patents; U.S. Patent No.: 7,174,292; Granted on Feb. 6, 2007
  • Method of Noise Estimation Using Incremental Bayes Learning, US. Patent; Patent No.: 7,165,026; Granted on Jan. 16, 2007
  • Method of iterative noise estimation in a recursive framework, U.S. Patent; Patent No. 7,139,703; Granted on Nov. 21, 2006.
  • Method of noise reduction using correction vectors based on dynamic aspects of speech and noise normalization, United States Patent No. 7,117,148; Granted on October 3, 2006.
  • Method of noise reduction based on dynamic aspects of speech, United States Patent No. 7,107,210; Granted on Sept 12, 2006.
  • Method of pattern recognition using noise reduction uncertainty, United States Patent No. 7,103,540; Granted on Sept 5, 2006.
  • Microphone array signal enhancement using mixture models (jointly with Hagai Attias), United States Patent No. 7,103,541; Granted on Sept 5, 2006.
  • Efficient backward recursion for computing posterior probabilities, United States Patent No. 7,062,407; Granted on June 13, 2006.
  • Method of speech recognition using time-dependent interpolation and hidden dynamics, United States (and International) Patent No. 7,050,975; Granted on May 23, 2006.
  • Nonlinear observation models for removing noise from corrupted speech, United States (and International) Patent No. 7,047,047; Granted on May 16, 2006.
  • Method of Noise Reduction Using Correction and Scaling Vectors with Partitioning of the Acoustic Space in the Domain of Noisy Speech, United States Patent No. 7,003,455; Granted on February 21, 2006
  • Methods and Apparatus for Denoising and Dereverberation Using Variational Inference and Strong Speech Models, United States Patent No. 6,990,447; Granted on January 24, 2006
  • Method and Apparatus for Removing Noise from Feature Vectors, United States Patent No. 6,985,858; Granted on January 10, 2006
  • Methods for Including the Category of Environmental Noise When Processing Speech Signals, United States Patent No. 6,959,276; Granted on October 25, 2005
  • Method of iterative noise estimation in a recursive framework, United States Patent; Patent No. 6,944,590; Granted on September 13, 2005
  • Method of speech recognition using variational inference with switching state space models, United States Patent; Patent No. 6,931,374; Granted on August 16, 2005
  • Pattern Recognition Training Method and Apparatus Using Inserted Noise Followed by Noise Reduction, United States (and International) Patent; Patent No. 6,876,966; Granted on April 5, 2005
  • Apparatus for Speaker Clustering and for Speech Recognition, Patent No.: 2,965,537; Granted on Aug. 13, 1999; Countries of issue: United States and Japan.
  • Apparatus for Speaker Normalization Processor and for Voice Recognition Device, Patent No.: 2986792; Granted on Oct. 1, 1999; Countries of issue: United States and Japan.
  • Patents (Pending awards)

  • Method of speech recognition using hidden trajectory hidden Markov models, U.S. Patent
  • Zero-variance model of acoustic environment for enhancing noisy speech features,” U.S. Patent
  • Method and Apparatus for Multi-Sensory Speech Enhancement,” International Patent;
  • Method and apparatus for continuous valued vocal tract resonance tracking using piecewise linear approximation
  • Speech resonance target estimation using formant tracking results, U.S. Patent
  • Incrementally regulating discriminative margins in MCE training for speech recognition,” U.S. Patent; filing date: 8/25/2006
  • Using a discretized, higher order representation of hidden dynamic variables for speech recognition,” U.S. Patent; filing date: 8/21/2006
  • Integrated speech recognition and semantic classification,” U.S. Patent; filing date: 1/19/2007
  • Segment-discriminating minimum classification error pattern recognition,” U.S. Patent; filing date: 1/31/2007
  • Maximum Entropy Model with Continuous Features, U.S. Patent; filing date: April 2009
  • Cross-lingual speech recognition with HMM using KL distance,” U.S. Patent; filing date: April 2009
  • Maximum entropy model with continuous features, U.S. Patent; filing date: 4/1/2009
  • Discriminative learning of feature functions of generative type in speech translation, filed 10/28/2011
  • Discriminative pretraining of deep neural networks, filed 11/26/2011
  • Tensor Deep Stacking Networks, filed 2/15/2012.
  • Multilingual Deep Neural Network, filed 3/11/2013
  • Assignment of semantic labels to a sequence of words using neural network architectures, filed 9/2/2013
  • Deep structured semantic model produced using click-through data. filed 9/6/2013.
  • Convolutional Latent Semantic Models and Their Applications. filed 4/1/2014
  • Context-Sensitive Search Using a Deep Learning Model, filed 4/14/2014
  • Modeling Interestingness with Deep Neural Networks, filed 6/13/2014
  • Training and operations of computational models, US patent filed 6/29/2015
  • Leveraging global data for enterprise data analytics, US patent filed 7/24/2015
  • Representing learning using multi-task deep neural networks, US patent filed 7/28/2015
  • Semantically-relevant discovery of solutions, US patent filed 8/28/2015
  • Discovery of semantic similarities between images and text, US patent filed 8/28/2015
  • Multi-Stage Image Querying, filed with the U.S. Patent and Trademark Office on 4/12/2016.