background pattern

Physics-Guided Vision-Language World Models for Agentic 4D Scene Understanding

This project develops a unified framework for physically grounded world modelling that combines video-based temporal prediction with Gaussian Splatting for photorealistic 3D representation. A Physics Vision-Language Model translates natural-language instructions into transformations that respect physical constraints, enabling interpretable and goal-directed control in dynamic scenes. By integrating perception, prediction, and action in a Vision-Language-Action loop, the research aims to advance agentic AI systems capable of transparent, physics-aware reasoning—supporting applications in robotics, simulation, and education.

This research is conducted via The Agentic AI Research and Innovation (AARI) Initiative which focuses on the next frontier of agentic systems through Grand Challenges with the academic community and Microsoft Research.

人员

Benjamin  Busam的肖像

Benjamin Busam

Professor

Technische Universität München

Hyunjun Jung的肖像

Hyunjun Jung

Post Doctoral Researcher

Technische Universität München

Alican Karaomer的肖像

Alican Karaomer

PhD Student

Technische Universität München

Mert  Kiray的肖像

Mert Kiray

PhD Student

Technische Universität München

Steven Kuang的肖像

Steven Kuang

PhD Student

Technische Universität München

Weihang Li的肖像

Weihang Li

PhD Student

Technische Universität München

Sarah Parisot的肖像

Sarah Parisot

Principal Researcher

Sergio Valcarcel Macua的肖像

Sergio Valcarcel Macua

Principal Research Scientist

Dani Velikova的肖像

Dani Velikova

Post Doctoral Researcher

Technische Universität München