This project develops a unified framework for physically grounded world modelling that combines video-based temporal prediction with Gaussian Splatting for photorealistic 3D representation. A Physics Vision-Language Model translates natural-language instructions into transformations that respect physical constraints, enabling interpretable and goal-directed control in dynamic scenes. By integrating perception, prediction, and action in a Vision-Language-Action loop, the research aims to advance agentic AI systems capable of transparent, physics-aware reasoning—supporting applications in robotics, simulation, and education.
People
Emanuele Aiello
Researcher
Benjamin Busam
Professor
Technische Universität München
Hyunjun Jung
Post Doctoral Researcher
Technische Universität München
Mert Kiray
PhD Student
Technische Universität München
Sarah Parisot
Principal Researcher
Sergio Valcarcel Macua
Senior Research Scientist
Dani Velikova
Post Doctoral Researcher
Technische Universität München