CamTrol

Training-free Camera Control for Video Generation

Chen Hou1, Guoqiang Wei2, Zhibo Chen1

1University of Science and Technology of China, 2 ByteDance

A robust, plug-and-play solution to offer camera control for video diffusion models.

Basic Camera Trajectories

Zoom Out Pan Left Tilt Up Truck Right Roll CW
Zoom In Pan Right Tilt Down Pedestal Down Roll ACW
CamTrol produces high-dynamic videos with designated camera moves. No training on specific data is required.

Hybrid and complex Trajectories

Hybrid: Zoom In first, then Pedestal Up.
Hybrid: Zoom Out + Pedestal Up + Truck Left + Tilt Down + Pan Right
complex_1
Complex Trajectory I
complex_2
Complex Trajectory II
complex_3
Complex Trajectory III
Combining basic ones, CamTrol could handle more complicated camera motions and generate videos with cinematic charm. Benifit from explicit camera motion modeling, CamTrol can also load pre-defined complex trajectories in precise coordinates.

3D Rotation-like Generation

Rotate Anticlockwise
Rotate Anticlockwise
Rotate Clockwise
Rotate Clockwise
CamTrol produces impressive generations of 3D rotation-like videos. Compared to 3D generation models, these results hold more diversity in style and have dynamic contents.

CamTrol could also handle 3D object generations. From this perspective, CamTrol can be seen as an infinite source of 3D data. Examples from OmniObject3D.

Multi-Trajectory Generation

Zoom Out Tilt Up Pan Left
Zoom In Tilt Down Pan Right
CamTrol has natural ability to generate videos of the same scene with different camera trajectories.

Motions at Different Scales

Scale I
Scale II
Scale III
CamTrol supports camera movements at controllable scales.

V.S. Prompt Engineering

"A dragon..." "People linger..." "A ceramic cat..." "Roses and birds..." "Volcano explodes..."
+"camera zooms in" +"rotates clockwise" +"rotates anticlockwise" +"pans right" +"zooms in, pedestal up"
+CamTrol +CamTrol +CamTrol +CamTrol +CamTrol
Compared with prompt engineering, CamTrol achieves more accurate camera motion control.

CogVideoX+CamTrol

Zoom In
Applying more powerful foundation models, CamTrol can produce higher-quality videos.

Method


CamTrol includes two-stage process. In stage I, camera movements are modeled through explicit 3D point cloud, leading to renderings indicating specific camera motion. In stage II, layout prior of noisy latents are utilized to guide video generation.

Paper: https://arxiv.org/abs/2406.10126