background pattern

Physics-Guided Vision-Language World Models for Agentic 4D Scene Understanding

This project develops a unified framework for physically grounded world modelling that combines video-based temporal prediction with Gaussian Splatting for photorealistic 3D representation. A Physics Vision-Language Model translates natural-language instructions into transformations that respect physical constraints, enabling interpretable and goal-directed control in dynamic scenes. By integrating perception, prediction, and action in a Vision-Language-Action loop, the research aims to advance agentic AI systems capable of transparent, physics-aware reasoning—supporting applications in robotics, simulation, and education.

Personne

Portrait de Emanuele Aiello

Emanuele Aiello

Researcher

Portrait de Benjamin  Busam

Benjamin Busam

Professor

Technische Universität München

Portrait de Hyunjun Jung

Hyunjun Jung

Post Doctoral Researcher

Technische Universität München

Portrait de Mert  Kiray

Mert Kiray

PhD Student

Technische Universität München

Portrait de Sarah Parisot

Sarah Parisot

Principal Researcher

Portrait de Sergio Valcarcel Macua

Sergio Valcarcel Macua

Senior Research Scientist

Portrait de Dani Velikova

Dani Velikova

Post Doctoral Researcher

Technische Universität München