Visual Understanding in Natural Language

December 12, 2017
Peter Anderson | Australian National University

Bridging visual and natural language understanding is a fundamental requirement for intelligent agents. This talk will focus mainly on automatic image captioning and visual question answering (VQA). I will cover some recent advances in automatic image caption evaluation, visual attention modeling and generalization to images ‘in the wild’. I will also introduce my recent work on vision-and-language navigation (VLN), in which we situate agents in a new RL environment constructed from dense RGB-D imagery of 90 real buildings.

- Xiaodong He
  
  Principal Researcher, Research Manager
Research Area
- Human language technologies

Watch Next

Beyond Swahili: Designing Inclusive AI for Bantu Languages
February 17, 2026
Alfred Malengo Kondoro
The Illusion of Inclusion How LLMs Misrepresent African Languages and Cultural Contexts
October 28, 2025
Shamsuddeen Hassan Muhammad
Distant conversational speech recognition: Challenges and Opportunities
October 15, 2025
Dr. Samuele Cornell,

Sunit Sivasankaran
Echoes in GenAI generations
September 4, 2025
Nebojsa Jojic
Claimify: Extracting high-quality claims from language model outputs
May 6, 2025
Dasha Metropolitansky
Session on IndiaAI: Large Language models - Dr. Kalika Bali, Principal Researcher, Microsoft
July 3, 2024
Advances in Natural Language Generation for Indian Languages
June 17, 2024
Dr. Raj Dabre
Insights into the Challenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: CLIP
June 4, 2024
Daniela Massiceti
Panel: Generative AI for Global Impact: Challenges and Opportunities
June 4, 2024
Jacki O'Neill,

Tanuja Ganu,

Sunayana Sitaram

, et. al.
Kalika Bali: The giant leaps in language technology -- and who's left behind | TED
April 26, 2021

Your Privacy Choices