Visual Understanding in Natural Language
- Peter Anderson | Australian National University
Bridging visual and natural language understanding is a fundamental requirement for intelligent agents. This talk will focus mainly on automatic image captioning and visual question answering (VQA). I will cover some recent advances in automatic image caption evaluation, visual attention modeling and generalization to images ‘in the wild’. I will also introduce my recent work on vision-and-language navigation (VLN), in which we situate agents in a new RL environment constructed from dense RGB-D imagery of 90 real buildings.
-
-
Xiaodong He
Principal Researcher, Research Manager
-
-