background pattern

Interactive World Simulator

Learning to simulate the visual world from large-scale videos.

Video Tokenization

VidTok: a cutting-edge family of video tokenizers that excels in both continuous and discrete tokenizations. [GitHub (opens in new tab)]

Autoregressive Video Models

Video In-Context Learning: autoregressive transformers are zero-shot video imitators.
Diagonal Decoding: fast autoregressive video generation with diagonal decoding.

4D World Simulator

Compositional 3D-aware Video Generation: C3V generates each concept in 3D representation separately and then composes them with priors from Large Language Models (LLM) and 2D diffusion models.