File size: 4,468 Bytes
7edde40
 
 
 
 
 
 
 
e611d2a
 
 
 
 
7edde40
e611d2a
 
019a5cb
 
e611d2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5727ff7
e611d2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
language:
- en
base_model:
- tencent/HunyuanPortrait
pipeline_tag: image-to-video
---

<p align="center">
  <img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanPortrait/refs/heads/main/assets/pics/logo.png"  height=100>
</p>

<div align="center">
<h2><font color="black"> HunyuanPortrait </font></center> <br> <center>Implicit Condition Control for Enhanced Portrait Animation</h2>

<a href='https://arxiv.org/abs/2503.18860'><img src='https://img.shields.io/badge/ArXiv-2503.18860-red'></a> 
<a href='https://kkakkkka.github.io/HunyuanPortrait/'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
<a href='https://huggingface.co/tencent/HunyuanPortrait'><img src="https://img.shields.io/static/v1?label=HuggingFace&message=HunyuanPortrait&color=yellow"></a>
</div>

## πŸ“œ Requirements
* An NVIDIA 3090 GPU with CUDA support is required. 
  * The model is tested on a single 24G GPU.
* Tested operating system: Linux

## Installation

```bash
git clone https://github.com/Tencent-Hunyuan/HunyuanPortrait
pip3 install torch torchvision torchaudio
pip3 install -r requirements.txt
```

## Download

All models are stored in `pretrained_weights` by default:
```bash
pip3 install "huggingface_hub[cli]"
cd pretrained_weights
huggingface-cli download --resume-download stabilityai/stable-video-diffusion-img2vid-xt --local-dir . --include "*.json"
wget -c https://huggingface.co/LeonJoe13/Sonic/resolve/main/yoloface_v5m.pt
wget -c https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/resolve/main/vae/diffusion_pytorch_model.fp16.safetensors -P vae
wget -c https://huggingface.co/FoivosPar/Arc2Face/resolve/da2f1e9aa3954dad093213acfc9ae75a68da6ffd/arcface.onnx
huggingface-cli download --resume-download tencent/HunyuanPortrait --local-dir hyportrait
```

And the file structure is as follows:
```bash
.
β”œβ”€β”€ arcface.onnx
β”œβ”€β”€ hyportrait
β”‚   β”œβ”€β”€ dino.pth
β”‚   β”œβ”€β”€ expression.pth
β”‚   β”œβ”€β”€ headpose.pth
β”‚   β”œβ”€β”€ image_proj.pth
β”‚   β”œβ”€β”€ motion_proj.pth
β”‚   β”œβ”€β”€ pose_guider.pth
β”‚   └── unet.pth
β”œβ”€β”€ scheduler
β”‚   └── scheduler_config.json
β”œβ”€β”€ unet
β”‚   └── config.json
β”œβ”€β”€ vae
β”‚   β”œβ”€β”€ config.json
β”‚   └── diffusion_pytorch_model.fp16.safetensors
└── yoloface_v5m.pt
```

## Run

πŸ”₯ Live your portrait by executing `bash demo.sh`

```bash
video_path="your_video.mp4"
image_path="your_image.png"

python inference.py \
    --config config/hunyuan-portrait.yaml \
    --video_path $video_path \
    --image_path $image_path
```

## Framework 
<img src="https://raw.githubusercontent.com/Tencent-Hunyuan/HunyuanPortrait/refs/heads/main/assets/pics/pipeline.png">

## TL;DR:
HunyuanPortrait is a diffusion-based framework for generating lifelike, temporally consistent portrait animations by decoupling identity and motion using pre-trained encoders. It encodes driving video expressions/poses into implicit control signals, injects them via attention-based adapters into a stabilized diffusion backbone, enabling detailed and style-flexible animation from a single reference image. The method outperforms existing approaches in controllability and coherence.

# πŸ–Ό Gallery

Some results of portrait animation using HunyuanPortrait can be found on our [Project page](https://kkakkkka.github.io/HunyuanPortrait/).

## Acknowledgements

The code is based on [SVD](https://github.com/Stability-AI/generative-models), [DiNOv2](https://github.com/facebookresearch/dinov2), [Arc2Face](https://github.com/foivospar/Arc2Face), [YoloFace](https://github.com/deepcam-cn/yolov5-face). We thank the authors for their open-sourced code and encourage users to cite their works when applicable.
Stable Video Diffusion is licensed under the Stable Video Diffusion Research License, Copyright (c) Stability AI Ltd. All Rights Reserved.
This codebase is intended solely for academic purposes.

# 🎼 Citation 
If you think this project is helpful, please feel free to leave a star⭐️⭐️⭐️ and cite our paper:
```bibtex
@article{xu2025hunyuanportrait,
  title={HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation},
  author={Xu, Zunnan and Yu, Zhentao and Zhou, Zixiang and Zhou, Jun and Jin, Xiaoyu and Hong, Fa-Ting and Ji, Xiaozhong and Zhu, Junwei and Cai, Chengfei and Tang, Shiyu and Lin, Qin and Li, Xiu and Lu, Qinglin},
  journal={arXiv preprint arXiv:2503.18860},
  year={2025}
}
```