background pattern

Interactive World Simulator

Learning to simulate the visual world from large-scale videos.


Video Tokenization

  • VidTok: a cutting-edge family of video tokenizers that excels in both continuous and discrete tokenizations. [GitHub (opens in new tab)]

Autoregressive Video Models

  • Video In-Context Learning: autoregressive transformers are zero-shot video imitators.
  • Diagonal Decoding: fast autoregressive video generation with diagonal decoding.

4D World Simulator

  • Compositional 3D-aware Video Generation: C3V generates each concept in 3D representation separately and then composes them with priors from Large Language Models (LLM) and 2D diffusion models.