UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
Paper
β’
2506.13691
β’
Published
β’
2
π€ Project | π Paper | π€ Hugging Face (UltraVideo Dataset)) | π€ Hugging Face (UltraVideo-Long Dataset)) | π€ Hugging Face (UltraWan-1K/4K Weights)
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
pip install diffsynth==1.1.7
pip install "huggingface_hub[cli]"
huggingface-cli download --repo-type model Wan-AI/Wan2.1-T2V-1.3B --local-dir ultrawan_weights/Wan2.1-T2V-1.3B --resume-download
huggingface-cli download --repo-type model APRIL-AIGC/UltraWan --local-dir ultrawan_weights/UltraWan --resume-download
==> one GPU
LoRA_1k: CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-1k.ckpt --mode lora --lora_alpha 0.25 --usp 0 --height 1088 --width 1920 --num_frames 81 --out_dir output/ultrawan-1k
LoRA_4k: CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-4k.ckpt --mode lora --lora_alpha 0.5 --usp 0 --height 2160 --width 3840 --num_frames 33 --out_dir output/ultrawan-4k
==> usp with 6 GPUs
LoRA_1k: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --standalone --nproc_per_node=6 infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-1k.ckpt --mode lora --lora_alpha 0.25 --usp 1 --height 1088 --width 1920 --num_frames 81 --out_dir output/ultrawan-1k
LoRA_4k: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --standalone --nproc_per_node=6 infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --model_path ultrawan_weights/UltraWan/ultrawan-4k.ckpt --mode lora --lora_alpha 0.5 --usp 1 --height 2160 --width 3840 --num_frames 33 --out_dir output/ultrawan-4k
==> one GPU
ori_1k: CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --mode full --usp 0 --height 1088 --width 1920 --num_frames 81 --out_dir output/ori-1k
==> usp with 6 GPUs
ori_1k: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --standalone --nproc_per_node=6 infer.py --model_dir ultrawan_weights/Wan2.1-T2V-1.3B --mode full --usp 0 --height 1088 --width 1920 --num_frames 81 --out_dir output/ori-1k
huggingface-cli download --repo-type dataset APRIL-AIGC/UltraVideo --local-dir ./UltraVideo --resume-download
The used VBench-style prompts in UltraVideo in the paper for reference:assets/ultravideo_prompts_in_VBench_style.json
We would like to thank the contributors to the Wan2.1, Qwen, umt5-xxl, diffusers and HuggingFace repositories, for their open researches.
If you find our work helpful, please cite us.
@article{ultravideo,
title={UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions},
author={Xue, Zhucun and Zhang, Jiangning and Hu, Teng and He, Haoyang and Chen, Yinan and Cai, Yuxuan and Wang, Yabiao and Wang, Chengjie and Liu, Yong and Li, Xiangtai and Tao, Dacheng},
journal={arXiv preprint arXiv:2506.13691},
year={2025}
}
Base model
Wan-AI/Wan2.1-T2V-1.3B