init
Browse files
README.md
CHANGED
|
@@ -1,194 +1,13 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
<a href='https://scholar.google.com/citations?hl=zh-CN&user=4oXBp9UAAAAJ' target='_blank'>Ying Shan <sup>2</sup> </a> 
|
| 15 |
-
<a target='_blank'>Fei Wang <sup>1</sup> </a> 
|
| 16 |
-
</div>
|
| 17 |
-
<br>
|
| 18 |
-
<div>
|
| 19 |
-
<sup>1</sup> Xi'an Jiaotong University   <sup>2</sup> Tencent AI Lab   <sup>3</sup> Ant Group  
|
| 20 |
-
</div>
|
| 21 |
-
<br>
|
| 22 |
-
<i><strong><a href='https://arxiv.org/abs/2211.12194' target='_blank'>CVPR 2023</a></strong></i>
|
| 23 |
-
<br>
|
| 24 |
-
<br>
|
| 25 |
-
|
| 26 |
-

|
| 27 |
-
|
| 28 |
-
<b>TL;DR: A realistic and stylized talking head video generation method from a single image and audio.</b>
|
| 29 |
-
|
| 30 |
-
<br>
|
| 31 |
-
|
| 32 |
-
</div>
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
## 📋 Changelog
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
- __2023.03.22__: Launch new feature: generating the 3d face animation from a single image. New applications about it will be updated.
|
| 39 |
-
|
| 40 |
-
- __2023.03.22__: Launch new feature: `still mode`, where only a small head pose will be produced via `python inference.py --still`.
|
| 41 |
-
- __2023.03.18__: Support `expression intensity`, now you can change the intensity of the generated motion: `python inference.py --expression_scale 1.3 (some value > 1)`.
|
| 42 |
-
|
| 43 |
-
- __2023.03.18__: Reconfig the data folders, now you can download the checkpoint automatically using `bash scripts/download_models.sh`.
|
| 44 |
-
- __2023.03.18__: We have offically integrate the [GFPGAN](https://github.com/TencentARC/GFPGAN) for face enhancement, using `python inference.py --enhancer gfpgan` for better visualization performance.
|
| 45 |
-
- __2023.03.14__: Specify the version of package `joblib` to remove the errors in using `librosa`, [](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) is online!
|
| 46 |
-
<details><summary> Previous Changelogs</summary>
|
| 47 |
-
- 2023.03.06 Solve some bugs in code and errors in installation
|
| 48 |
-
- 2023.03.03 Release the test code for audio-driven single image animation!
|
| 49 |
-
- 2023.02.28 SadTalker has been accepted by CVPR 2023!
|
| 50 |
-
|
| 51 |
-
</details>
|
| 52 |
-
|
| 53 |
-
## 🎼 Pipeline
|
| 54 |
-

|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
## 🚧 TODO
|
| 58 |
-
|
| 59 |
-
- [x] Generating 2D face from a single Image.
|
| 60 |
-
- [x] Generating 3D face from Audio.
|
| 61 |
-
- [x] Generating 4D free-view talking examples from audio and a single image.
|
| 62 |
-
- [x] Gradio/Colab Demo.
|
| 63 |
-
- [ ] Full body/image Generation.
|
| 64 |
-
- [ ] training code of each componments.
|
| 65 |
-
- [ ] Audio-driven Anime Avatar.
|
| 66 |
-
- [ ] interpolate ChatGPT for a conversation demo 🤔
|
| 67 |
-
- [ ] integrade with stable-diffusion-web-ui. (stay tunning!)
|
| 68 |
-
|
| 69 |
-
https://user-images.githubusercontent.com/4397546/222513483-89161f58-83d0-40e4-8e41-96c32b47bd4e.mp4
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
## 🔮 Inference Demo!
|
| 73 |
-
|
| 74 |
-
#### Dependence Installation
|
| 75 |
-
|
| 76 |
-
<details><summary>CLICK ME</summary>
|
| 77 |
-
|
| 78 |
-
```
|
| 79 |
-
git clone https://github.com/Winfredy/SadTalker.git
|
| 80 |
-
cd SadTalker
|
| 81 |
-
conda create -n sadtalker python=3.8
|
| 82 |
-
source activate sadtalker
|
| 83 |
-
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
|
| 84 |
-
conda install ffmpeg
|
| 85 |
-
pip install dlib-bin # [dlib-bin is much faster than dlib installation] conda install dlib
|
| 86 |
-
pip install -r requirements.txt
|
| 87 |
-
|
| 88 |
-
### install gpfgan for enhancer
|
| 89 |
-
pip install git+https://github.com/TencentARC/GFPGAN
|
| 90 |
-
|
| 91 |
-
```
|
| 92 |
-
|
| 93 |
-
</details>
|
| 94 |
-
|
| 95 |
-
#### Trained Models
|
| 96 |
-
<details><summary>CLICK ME</summary>
|
| 97 |
-
|
| 98 |
-
You can run the following script to put all the models in the right place.
|
| 99 |
-
|
| 100 |
-
```bash
|
| 101 |
-
bash scripts/download_models.sh
|
| 102 |
-
```
|
| 103 |
-
|
| 104 |
-
OR download our pre-trained model from [google drive](https://drive.google.com/drive/folders/1Wd88VDoLhVzYsQ30_qDVluQr_Xm46yHT?usp=sharing) or our [github release page](https://github.com/Winfredy/SadTalker/releases/tag/v0.0.1), and then, put it in ./checkpoints.
|
| 105 |
-
|
| 106 |
-
| Model | Description
|
| 107 |
-
| :--- | :----------
|
| 108 |
-
|checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker.
|
| 109 |
-
|checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker.
|
| 110 |
-
|checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker.
|
| 111 |
-
|checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from [the reappearance of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis).
|
| 112 |
-
|checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction).
|
| 113 |
-
|checkpoints/wav2lip.pth | Highly accurate lip-sync model in [Wav2lip](https://github.com/Rudrabha/Wav2Lip).
|
| 114 |
-
|checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in [dilb](http://dlib.net/).
|
| 115 |
-
|checkpoints/BFM | 3DMM library file.
|
| 116 |
-
|checkpoints/hub | Face detection models used in [face alignment](https://github.com/1adrianb/face-alignment).
|
| 117 |
-
|
| 118 |
-
</details>
|
| 119 |
-
|
| 120 |
-
#### Generating 2D face from a single Image
|
| 121 |
-
|
| 122 |
-
```bash
|
| 123 |
-
python inference.py --driven_audio <audio.wav> \
|
| 124 |
-
--source_image <video.mp4 or picture.png> \
|
| 125 |
-
--batch_size <default equals 2, a larger run faster> \
|
| 126 |
-
--expression_scale <default is 1.0, a larger value will make the motion stronger> \
|
| 127 |
-
--result_dir <a file to store results> \
|
| 128 |
-
--enhancer <default is None, you can choose gfpgan or RestoreFormer>
|
| 129 |
-
```
|
| 130 |
-
|
| 131 |
-
<!-- ###### The effectness of enhancer `gfpgan`. -->
|
| 132 |
-
|
| 133 |
-
| basic | w/ still mode | w/ exp_scale 1.3 | w/ gfpgan |
|
| 134 |
-
|:-------------: |:-------------: |:-------------: |:-------------: |
|
| 135 |
-
| <video src="https://user-images.githubusercontent.com/4397546/226097707-bef1dd41-403e-48d3-a6e6-6adf923843af.mp4"></video> | <video src='https://user-images.githubusercontent.com/4397546/226804933-b717229f-1919-4bd5-b6af-bea7ab66cad3.mp4'></video> | <video style='width:256px' src="https://user-images.githubusercontent.com/4397546/226806013-7752c308-8235-4e7a-9465-72d8fc1aa03d.mp4"></video> | <video style='width:256px' src="https://user-images.githubusercontent.com/4397546/226097717-12a1a2a1-ac0f-428d-b2cb-bd6917aff73e.mp4"></video> |
|
| 136 |
-
|
| 137 |
-
> Kindly ensure to activate the audio as the default audio playing is incompatible with GitHub.
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
<!-- <video src="./docs/art_0##japanese_still.mp4"></video> -->
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
#### Generating 3D face from Audio
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
| Input | Animated 3d face |
|
| 147 |
-
|:-------------: | :-------------: |
|
| 148 |
-
| <img src='examples/source_image/art_0.png' width='200px'> | <video src="https://user-images.githubusercontent.com/4397546/226856847-5a6a0a4d-a5ec-49e2-9b05-3206db65e8e3.mp4"></video> |
|
| 149 |
-
|
| 150 |
-
> Kindly ensure to activate the audio as the default audio playing is incompatible with GitHub.
|
| 151 |
-
|
| 152 |
-
More details to generate the 3d face can be founded [here](docs/face3d.md)
|
| 153 |
-
|
| 154 |
-
#### Generating 4D free-view talking examples from audio and a single image
|
| 155 |
-
|
| 156 |
-
We use `camera_yaw`, `camera_pitch`, `camera_roll` to control camera pose. For example, `--camera_yaw -20 30 10` means the camera yaw degree changes from -20 to 30 and then changes from 30 to 10.
|
| 157 |
-
```bash
|
| 158 |
-
python inference.py --driven_audio <audio.wav> \
|
| 159 |
-
--source_image <video.mp4 or picture.png> \
|
| 160 |
-
--result_dir <a file to store results> \
|
| 161 |
-
--camera_yaw -20 30 10
|
| 162 |
-
```
|
| 163 |
-

|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
## 🛎 Citation
|
| 167 |
-
|
| 168 |
-
If you find our work useful in your research, please consider citing:
|
| 169 |
-
|
| 170 |
-
```bibtex
|
| 171 |
-
@article{zhang2022sadtalker,
|
| 172 |
-
title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
|
| 173 |
-
author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
|
| 174 |
-
journal={arXiv preprint arXiv:2211.12194},
|
| 175 |
-
year={2022}
|
| 176 |
-
}
|
| 177 |
-
```
|
| 178 |
-
|
| 179 |
-
## 💗 Acknowledgements
|
| 180 |
-
|
| 181 |
-
Facerender code borrows heavily from [zhanglonghao's reproduction of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis) and [PIRender](https://github.com/RenYurui/PIRender). We thank the authors for sharing their wonderful code. In training process, We also use the model from [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction) and [Wav2lip](https://github.com/Rudrabha/Wav2Lip). We thank for their wonderful work.
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
## 🥂 Related Works
|
| 185 |
-
- [StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)](https://github.com/FeiiYin/StyleHEAT)
|
| 186 |
-
- [CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)](https://github.com/Doubiiu/CodeTalker)
|
| 187 |
-
- [VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)](https://github.com/vinthony/video-retalking)
|
| 188 |
-
- [DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)](https://github.com/Carlyx/DPE)
|
| 189 |
-
- [3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)](https://github.com/FeiiYin/SPI/)
|
| 190 |
-
- [T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)](https://github.com/Mael-zys/T2M-GPT)
|
| 191 |
-
|
| 192 |
-
## 📢 Disclaimer
|
| 193 |
-
|
| 194 |
-
This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: SadTalker
|
| 3 |
+
emoji: 🦀
|
| 4 |
+
colorFrom: purple
|
| 5 |
+
colorTo: green
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 3.15.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|