Image-to-Video
MotionPro / README.md
zzwustc's picture
Update README.md
8cbd760 verified
---
license: apache-2.0
base_model:
- stabilityai/stable-video-diffusion-img2vid
pipeline_tag: image-to-video
---
# MotionPro
<p align="center">
<img src="assets/logo.png" width="400"/>
<p>
<p align="center">
πŸ–₯️ <a href="https://github.com/HiDream-ai/MotionPro">GitHub</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://zhw-zhang.github.io/MotionPro-page/"><b>Project Page</b></a> &nbsp&nbsp | &nbsp&nbspπŸ€— <a href="https://huggingface.co/HiDream-ai/MotionPro/tree/main">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp πŸ“‘ <a href="https://arxiv.org/abs/2505.20287">Paper </a> &nbsp&nbsp | &nbsp&nbsp πŸ“– <a href="https://arxiv.org/pdf/2505.20287">PDF</a> &nbsp&nbsp
<br>
[**MotionPro: A Precise Motion Controller for Image-to-Video Generation**](https://zhw-zhang.github.io/MotionPro-page/) <be>
πŸ”† If you find MotionPro useful, please give a ⭐ for this repo, which is important to Open-Source projects. Thanks!
In this repository, we introduce **MotionPro**, an image-to-video generation model built on SVD. MotionPro learns object and camera motion control from **in-the-wild** video datasets (e.g., WebVid-10M) without applying special data filtering. The model offers the following key features:
- **User-friendly interaction.** Our model requires only simple conditional inputs, allowing users to achieve I2V motion control generation through brushing and dragging.
- **Simultaneous control of object and camera motion.** Our trained MotionPro model supports simultaneous object and camera motion control. Moreover, our model can achieve precise camera control driven by pose without requiring training on a specific camera-pose paired dataset. [More Details](assets/camera_control.png)
- **Synchronized video generation.** This is an extension of our model. By combining MotionPro and MotionPro-Dense, we can achieve synchronized video generation. [More Details](assets/README_syn.md)
Additionally, our repository provides more tools to benefit the research community's development.:
- **Memory optimization for training.** We provide a training framework based on PyTorch Lightning, optimized for memory efficiency, enabling SVD fine-tuning with a batch size of 8 per NVIDIA A100 GPU.
- **Data construction tools.** We offer scripts for constructing training data. Additionally, we also provide code for loading datasets in two formats, supporting video input from both folders (Dataset) and tar files (WebDataset).
- **MC-Bench and evaluation code.** We constructed MC-Bench with 1.1K user-annotated image-trajectory pairs, along with evaluation scripts for comprehensive assessments. All the images showcased on the project page can be found here.
## Video Demos
<div align="center">
<video controls autoplay loop muted playsinline src="https://cdn-uploads.huggingface.co/production/uploads/6496f5754a3c31df8e3139f6/1nWsmo8XhocqTeqHY7OlA.mp4"></video>
<p><em>Examples of different motion control types by our MotionPro.</em></p>
</div>
## πŸ”₯ Updates
- [x] **\[2025.03.26\]** Release inference and training code.
- [x] **\[2025.04.08\]** Release MC-Bench and evaluation code.
- [x] **\[2025.05.20\]** Upload annotation tool for image-trajectory pair construction.
- [x] **\[2025.05.27\]** Upload Our arXiv Paper.
## πŸƒπŸΌ Inference
<details open>
<summary><strong>Environment Requirement</strong></summary>
Clone the repo:
```
git clone https://github.com/HiDream-ai/MotionPro.git
```
Install dependencies:
```
conda create -n motionpro python=3.10.0
conda activate motionpro
pip install -r requirements.txt
```
</details>
<details open>
<summary><strong>Model Download</strong></summary>
| Models | Download Link | Notes |
|-------------------|-------------------------------------------------------------------------------|--------------------------------------------|
| MotionPro | πŸ€—[Huggingface](https://huggingface.co/HiDream-ai/MotionPro/blob/main/MotionPro-gs_16k.pt) | Supports both object and camera control. This is the default model mentioned in the paper. |
| MotionPro-Dense | πŸ€—[Huggingface](https://huggingface.co/HiDream-ai/MotionPro/blob/main/MotionPro_Dense-gs_14k.pt) | Supports synchronized video generation when combined with MotionPro. MotionPro-Dense shares the same architecture as Motion, but the input conditions are modified to include: dense optical flow and per-frame visibility masks relative to the first frame. |
Download the model from HuggingFace at high speeds (30-75MB/s):
```
cd tools/huggingface_down
bash download_hfd.sh
```
</details>
<details open>
<summary><strong>Run Motion Control</strong></summary>
This section of the code supports simultaneous object motion and camera motion control. We provide a user-friendly Gradio demo interface that allows users to control motion with simple brushing and dragging operations. The instructional video can be found in `assets/demo.mp4` (please note the version of gradio).
```
python demo_sparse_flex_wh.py
```
When you expect all pixels to move (e.g., for camera control), you need to use the brush to fully cover the entire area. You can also test the demo using `assets/logo.png`.
Additionally, users can also generate controllable image-to-video results using pre-defined camera trajectories. Note that our model has not been trained on a specific camera control dataset. Test the demo using `assets/sea.png`.
```
python demo_sparse_flex_wh_pure_camera.py
```
</details>
<details open>
<summary><strong>Run synchronized video generation and video recapture</strong></summary>
By combining MotionPro and MotionPro-Dense, we can achieve the following functionalities:
- Synchronized video generation. We assume that two videos, `pure_obj_motion.mp4` and `pure_camera_motion.mp4`, have been generated using the respective demos. By combining their motion flows and using the result as a condition for MotionPro-Dense, we obtain `final_video`. By pairing the same object motion with different camera motions, we can generate `synchronized videos` where the object motion remains consistent while the camera motion varies. [More Details](assets/README_syn.md)
Here, you need to first download the [model_weights](https://huggingface.co/HiDream-ai/MotionPro/blob/main/tools/co-tracker/checkpoints/scaled_offline.pth) of cotracker and place them in the `tools/co-tracker/checkpoints` directory.
```
python inference_dense.py --ori_video 'assets/cases/dog_pure_obj_motion.mp4' --camera_video 'assets/cases/dog_pure_camera_motion_1.mp4' --save_name 'syn_video.mp4' --ckpt_path 'MotionPro-Dense CKPT-PATH'
```
</details>
## πŸš€ Training
<details open>
<summary><strong>Data Prepare</strong></summary>
We have packaged several demo videos to help users debug the training code. Simply πŸ€—[download](https://huggingface.co/HiDream-ai/MotionPro/tree/main/data), extract the files, and place them in the `./data` directory.
Additionally, `./data/dot_single_video` contains code for processing raw videos using [DOT](https://github.com/16lemoing/dot) to generate the necessary conditions for training, making it easier for the community to create training datasets.
</details>
<details open>
<summary><strong>Train</strong></summary>
Simply run the following command to train MotionPro:
```
bash train_server_1.sh
```
In addition to loading video data from folders, we also support [WebDataset](https://rom1504.github.io/webdataset/), allowing videos to be read directly from tar files for training. This can be enabled by modifying the config file:
```
train_debug_from_folder.yaml -> train_debug_from_tar.yaml
```
Furthermore, to train the **MotionPro-Dense** model, simply modify the `train_debug_from_tar.yaml` file by changing `VidTar` to `VidTar_all_flow` and updating the `ckpt_path`.
</details>
## πŸ“Evaluation
<strong>MC-Bench</strong>
Simply download πŸ€—[MC-Bench](https://huggingface.co/HiDream-ai/MotionPro/blob/main/data/MC-Bench.tar), extract the files, and place them in the `./data` directory.
<strong>Run eval script</strong>
Simply execute the following command to evaluate MotionPro on MC-Bench and Webvid:
```
bash eval_model.sh
```
## 🌟 Star and Citation
If you find our work helpful for your research, please consider giving a star⭐ on this repository and citing our work.
```
@inproceedings{2025motionpro,
title={{MotionPro: A Precise Motion Controller for Image-to-Video Generation}},
author={Zhongwei Zhang and Fuchen Long and Zhaofan Qiu and Yingwei Pan and Wu Liu and Ting Yao and Tao Mei},
booktitle={CVPR},
year={2025}
}
```
## πŸ’– Acknowledgement
<span id="acknowledgement"></span>
Our code is inspired by several works, including [SVD](https://github.com/Stability-AI/generative-models), [DragNUWA](https://github.com/ProjectNUWA/DragNUWA), [DOT](https://github.com/16lemoing/dot), [Cotracker](https://github.com/facebookresearch/co-tracker). Thanks to all the contributors!