Deep Learning for Text Processing


August 4, 2014


Li Deng, Eric Xing, Xiaodong He, Jianfeng Gao, Christopher Manning, Paul Smolensky, and Jeff A Bilmes


MSR, Carnegie Mellon University, Microsoft Research, Redmond, MSR Redmond, Stanford, Johns Hopkins University, University of Washington


Deep learning has enjoyed tremendous success in recent years in speech and visual object recognition, as well as in language processing (although to somewhat less extent). The focus of this session is on deep learning approaches to problems in language or text processing, with particular emphasis on important applications with vital significance to Microsoft. First, we will have both academic and Microsoft Research experts provide a tutorial on the latest deep learning technology, presenting both theoretical and practical perspectives on common methods of deep neural networks and recurrent, recursive, stacking, and convolutional networks. We will highlight special challenges faced by language/text processing, and elaborate on how new deep learning technologies are poised to fundamentally address these issues. We will share Microsoft Research’s experience in developing Deep-Structured Semantic Models (DSSM) and their successful applications to web search, ads selection, machine translation, and entity search.


Li Deng, Eric Xing, Xiaodong He, Jianfeng Gao, Christopher Manning, Paul Smolensky, and Jeff A Bilmes

Li Deng was a professor at the University of Waterloo from 1989 to 1999 and then joined Microsoft Research Redmond, where he is a principal researcher. His recent research activities include deep learning and machine intelligence for speech and related information processing. Trishul Chilimbi is a principal researcher at Microsoft Research Redmond. He recently spent two years in the Online Services Division, where he co-led a project to build a new search engine optimized for tail content. His current focus is system infrastructure for scaling deep learning. Alex Acero joined Microsoft Research in 1994 and has managed the Speech group since 2000. He is director of Microsoft Research Redmond’s Conversational Systems Research Center. Before joining Microsoft, he was the manager of the speech group at Telefónica (1992-1993) and senior engineer at Apple Computer (1990-1991).

Dr. Eric Xing is an associate professor in the School of Computer Science at Carnegie Mellon University. His principal research interests lie in the development of machine learning and statistical methodology; especially for solving problems involving automated learning, reasoning, and decision-making in high-dimensional and dynamic possible worlds; and for building quantitative models and predictive understandings of biological systems. Professor Xing received a Ph.D. in Molecular Biology from Rutgers University, and another Ph.D. in Computer Science from UC Berkeley. His current work involves, 1) foundations of statistical learning, including theory and algorithms for estimating time/space varying-coefficient models, sparse structured input/output models, and nonparametric Bayesian models; 2) computational and statistical analysis of gene regulation, genetic variation, and disease associations; and 3) application of statistical learning in social networks, data mining, and vision.

Xiaodong He is a Researcher of Microsoft Research, Redmond. He is also an Affiliate Professor in Electrical Engineering at the University of Washington, Seattle. His research interests include deep learning, spoken language understanding, machine translation, natural language processing, information retrieval, and machine learning. Dr. He has published a book and more than 60 technical papers in these areas, and has given a tutorial on speech translation at ICASSP2013. In benchmark evaluations, he and his colleagues have developed entries that obtained No. 1 place in the 2008 NIST Machine Translation Evaluation (NIST MT) and the 2011 International Workshop on Spoken Language Translation Evaluation (IWSLT), both in Chinese-English translation, respectively. He served as Associate Editor/Guest Editor of several IEEE Journals and in the organizing committee of ICASSP2013. He is a senior member of IEEE and a member of ACL.

Jianfeng Gao is Principal Researcher in Natural Language Processing Group at Microsoft Research. Recently, he joined Deep Learning Technology Center (DLTC) at Microsoft Research, working on Deep Learning for Text Processing. From 2005 to 2006, he was a software developer in Natural Interactive Services Division at Microsoft. From 1999 to 2005, he was a researcher in Natural Language Computing Group at Microsoft Research Asia.

Christopher Manning is a Professor of Computer Science and Linguistics at Stanford University. His Ph.D. is from Stanford in 1995, and he held faculty positions at Carnegie Mellon University and the University of Sydney before returning to Stanford. He is a fellow of ACM, AAAI, and the Association for Computational Linguistics. Manning has coauthored leading textbooks on statistical approaches to natural language processing (Manning and Schuetze, 1999) and information retrieval (Manning, Raghavan, and Schuetze, 2008). His recent work has concentrated on probabilistic approaches to natural language processing (NLP) problems and computational semantics, particularly including such topics as statistical parsing, robust textual inference, machine translation, large-scale joint inference for NLP, computational pragmatics, and hierarchical deep learning for NLP.

Paul Smolensky is Krieger-Eisenhower Professor of Cognitive Science at Johns Hopkins University. His research develops methods for performing grammatical computation in neural networks. A member of the PDP Research Group at UCSD (1986), he developed Harmony Theory, proposing what is now known as the “Restricted Boltzmann Machine” architecture. He then developed Tensor Product Representations (1990), a compositional, recursive technique for encoding symbol structures as real-valued activation vectors. Combining these two theories, he developed Harmonic Grammar (1990, with G. Legendre & Y. Miyata) and then Optimality Theory (1993, with A. Prince), a grammatical formalism now widely used in phonological theory. He received the 2005 David E. Rumelhart Prize for Outstanding Contributions to the Formal Analysis of Human Cognition and will hold the Sapir Professorship at the 2015 LSA Linguistic Institute.

Jeff A. Bilmes is a professor at the Department of Electrical Engineering at the University of Washington, Seattle and an adjunct professor in Computer Science & Engineering and the department of Linguistics. He received his Ph.D. in computer science from the University of California in Berkeley. He is a 2001 NSF Career award winner, a 2002 CRA Digital Government Fellow, a 2008 NAE Gilbreth Lectureship award recipient, and a 2012/2013 ISCA Distinguished Lecturer. His primary interests lie in signal processing for pattern classification, speech recognition, language processing, bioinformatics, machine learning, graphical models, submodularity in combinatorial optimization and machine learning, active and semi-supervised learning, computer vision, and audio/music processing. Beginning work in this area in 2003, Prof. Bilmes is one of the first to utilize submodularity in machine learning problems.