Microsoft Research Blog

Novel object captioning surpasses human performance on benchmarks

October 14, 2020 | Kevin Lin, Xiaowei Hu, and Lijuan Wang

Consider for a moment what it takes to visually identify and describe something to another person. Now imagine that the other person can’t see the object or image, so every detail matters. How do you decide what information is important…

Microsoft Research Blog

Objects are the secret key to revealing the world between vision and language

May 15, 2020 | Chunyuan Li, Lei Zhang, and Jianfeng Gao

Humans perceive the world through many channels, such as images viewed by the eyes or voices heard by the ears. Though any individual channel might be incomplete or noisy, humans can naturally align and fuse the information collected from multiple…

Microsoft Research Blog

A deep generative model trifecta: Three advances that work towards harnessing large-scale power

April 9, 2020 | Chunyuan Li and Jianfeng Gao

One of the core aspirations in artificial intelligence is to develop algorithms and techniques that endow computers with an ability to synthesize the observed data in our world. Every time researchers build a model to imitate this ability, this model…

Microsoft Research Podcast

Going deep on deep learning with Dr. Jianfeng Gao

January 29, 2020

Dr. Jianfeng Gao is a veteran computer scientist, an IEEE Fellow and the current head of the Deep Learning Group at Microsoft Research. He and his team are exploring novel approaches to advancing the state-of-the-art on deep learning in areas…

an equation where x and y are unknowns above an illustration with x and y bouncing through like pinballs

Microsoft Research Blog

Next-generation architectures bridge gap between neural and symbolic representations with neural symbols

December 12, 2019 | Paul Smolensky

In both language and mathematics, symbols and their mutual relationships play a central role. The equation x = 1/y asserts the symbols x and y—that is, what they stand for—are related reciprocally; Kim saw the movie asserts that Kim and…

New unified VLP model seeks to improve scene and language understanding

Microsoft Research Blog

Expanding scene and language understanding with large-scale pre-training and a unified architecture

October 8, 2019 | Hamid Palangi

Making sense of the world around us is a skill we as human beings begin to learn from an early age. Though there is still much to know about the process, we can see that people learn a lot, both…

Microsoft Research Blog

See what we mean – Visually grounded natural language navigation is going places

June 18, 2019

How do humans communicate efficiently? The common belief is that the words humans use to communicate – such as dog, for instance – invoke similar understanding of the physical concepts. Indeed, there exists a common conception about the physical appearance…

In the news | Synced

ICLR 2019 | MILA, Microsoft, and MIT Share Best Paper Honours

June 5, 2019

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks, from the Montreal Institute for Learning Algorithms (MILA) and the Microsoft Research Montréal lab, was one of two Best Paper winners at ICLR 2019.

Microsoft Research Blog

Less pain, more gain: A simple method for VAE training with less of that KL-vanishing agony

April 15, 2019 | Chunyuan Li

There is a growing interest in exploring the use of variational auto-encoders (VAE), a deep latent variable model, for text generation. Compared to the standard RNN-based language model that generates sentences one word at a time without the explicit guidance…