Microsoft researchers tie for best image captioning technology

Published

Posted by Allison Linn

Researchers representing Microsoft and Google will present their latest advances Friday in automated image captioning (opens in new tab), a hot field that could have broad implications for artificial intelligence (opens in new tab).

PODCAST SERIES

AI Testing and Evaluation: Learnings from Science and Industry

Discover how Microsoft is learning from other domains to advance evaluation and testing as a pillar of AI governance.

The researchers will be speaking at a workshop that is part of CVPR (opens in new tab), an annual conference on the most cutting-edge advances in computer vision (opens in new tab) research. The workshop (opens in new tab) is highlighting the winners of several image-related challenges.

The two companies’ research groups tied for first place (opens in new tab) in the recent MS COCO Image Captioning Challenge 2015 (opens in new tab). There were 15 submissions from top universities and industrial research labs vying to automatically create the most informative and interesting captions.

The winners were decided based on two main metrics: The share of captions that were equal to or better than a caption written by a person, and the share of captions that would pass a Turing test.

The Turing test (opens in new tab), named after a paper published by Alan Turing in 1950, is a test of whether a human would believe something generated by a computer was actually written by a human.

The Microsoft team outperformed competitors on the Turing test element, while the Google team won for the share of captions that were as good, or better, than what people could produce.

The field of automated image captioning has exploded since researchers hit upon the idea of using neural networks, which are computing elements that are modeled loosely after the human brain, to connect vision to language.

Many researchers see image captioning as the basis for more sophisticated artificial intelligence systems that can see, hear, speak and even understand.

Related:

Allison Linn is a senior writer at Microsoft Research. Follow Allison on Twitter (opens in new tab).