Deep Attention Mechanism for Multimodal Intelligence: Perception, Reasoning, and Expression across Vision and Language

  • Xiaodong He | Microsoft Research

We have long envisioned that machines one day can perform human-like perception, reasoning, and expression across multiple modalities including vision and language, which will augment and transform the ways humans communicate with each other and with the real world. With this vision, I’ll use three tasks as examples to demonstrate recent progress in multimodal intelligence, including image-to-language generation, visual question answering, and language-to-image synthesis. I’ll discuss the open problems behind these tasks that we are thrilled to solve, including image and language understanding, joint reasoning across both modalities, and expressing abstract concepts by natural language or image generation. I’ll also discuss the deep attention mechanisms recently developed to address these challenging problems, and analyze the interpretability and controllability in learning algorithms, which are of fundamental importance to general intelligence.

Speaker Details
Xiaodong He is a Researcher of Microsoft Research, Redmond, WA, USA. He is also an Affiliate Professor in Electrical Engineering at the University of Washington, Seattle, WA, USA. His research interests include deep learning, information retrieval, natural language understanding, machine translation, computer vision, and speech recognition. Dr. He has published a book and more than 70 technical papers in these areas, and has given tutorials at international conferences in these fields. In benchmark evaluations, he and his colleagues have developed entries that obtained No. 1 place in the 2008 NIST Machine Translation Evaluation (NIST MT) and the 2011 International Workshop on Spoken Language Translation Evaluation (IWSLT), both in Chinese-English translation, respectively. He serves as Associate Editor of IEEE Signal Processing Magazine and IEEE Signal Processing Letters, as Guest Editors of IEEE TASLP for the Special Issue on Continuous-space and related methods in natural language processing, and Area Chair of NAACL2015. He also served as GE for several IEEE Journals, and served in organizing committees and program committees of major speech and language processing conferences in the past. He is a senior member of IEEE and a member of ACL.