Video Tokenization
- VidTok: a cutting-edge family of video tokenizers that excels in both continuous and discrete tokenizations. [GitHub (opens in new tab)]
Autoregressive Video Models
- Video In-Context Learning: autoregressive transformers are zero-shot video imitators.
- Diagonal Decoding: fast autoregressive video generation with diagonal decoding.
4D World Simulator
- Compositional 3D-aware Video Generation: C3V generates each concept in 3D representation separately and then composes them with priors from Large Language Models (LLM) and 2D diffusion models.