Structure Visual Understanding and Interaction with Human and Environment
- Jianwei Yang | Georgia Tech
The visual world around us is highly structured. As 2D projection of our world, images are also structured. In images, there are usually a background and some foreground objects (e.g., kites and birds in the sky, sheep and cows on the grass). Moreover, objects usually interact with each other in predictable ways (e.g., mugs are on tables, keyboards are below computer monitors, the sky is in the background). This structure in our world manifests itself in the visual data that captures the world around us. In this talk, I will talk about how to leverage this structure in our visual world for visual understanding and interactions with language and environment. Specifically, I will present: 1) how to learn to prune dense graph and perform relational modeling for scene graph generation; 2) how to leverage structure in images for more grounded caption generation and question generation to actively acquire more information from humans; 3) How to learn a moving strategy for embodied visual system in a 3D environments to achieve better visual perception through actions. Finally, I will briefly talk about my ongoing and future works which are aimed at connecting vision, language, and environment towards better visual understanding and interactions.
Speaker Details
Jianwei Yang is a fifth-year Ph.D. candidate in the College of Computing at Georgia Tech. His main research is about computer vision and the combination with language and embodiment. Specifically, his work focus on how to extract the structure from visual data and then how to leverage the interactions with human and environment to help further improve the visual system. On the other hand, he also worked on how to leverage structural understanding of images towards the other tasks, such as visual question answering, visual question generation, etc. In the past, he has interned at Facebook AI Research (FAIR), Snap Research. MIT-IBM Watson AI Lab. More information can be found on his homepage: https://www.cc.gatech.edu/~jyang375/ (opens in new tab)