DragNUWA background

DragNUWA

DragNUWA is a video generation model that utilizes text, images, and trajectory as three essential control factors to facilitate highly controllable video generation.

DragNUWA is a video generation model that utilizes text, images, and trajectory as three essential control factors to facilitate highly controllable video generation from semantic, spatial, and temporal aspects. Distinct from existing research, DragNUWA enables users to manipulate backgrounds or objects within images directly, and the model seamlessly translates these actions into camera movements or object motions, generating the corresponding video.

Click the top-left “play” button to observe how DragNUWA manipulates the same image to create videos with desired camera movements and object motions.

DragNUWA-Fig1

DragNUWA-Fig1