|
# One-Shot Free-View Neural Talking Head Synthesis |
|
Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing". |
|
|
|
```Python 3.6``` and ```Pytorch 1.7``` are used. |
|
|
|
|
|
Updates: |
|
-------- |
|
```2021.11.05``` : |
|
* <s>Replace Jacobian with the rotation matrix (Assuming J = R) to avoid estimating Jacobian.</s> |
|
* Correct the rotation matrix. |
|
|
|
```2021.11.17``` : |
|
* Better Generator, better performance (models and checkpoints have been released). |
|
|
|
Driving | Beta Version | FOMM | New Version: |
|
|
|
|
|
https://user-images.githubusercontent.com/17874285/142828000-db7b324e-c2fd-4fdc-a272-04fb8adbc88a.mp4 |
|
|
|
|
|
-------- |
|
Driving | FOMM | Ours: |
|
 |
|
|
|
Free-View: |
|
 |
|
|
|
Train: |
|
-------- |
|
``` |
|
python run.py --config config/vox-256.yaml --device_ids 0,1,2,3,4,5,6,7 |
|
``` |
|
|
|
Demo: |
|
-------- |
|
``` |
|
python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame |
|
``` |
|
free-view (e.g. yaw=20, pitch=roll=0): |
|
``` |
|
python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame --free_view --yaw 20 --pitch 0 --roll 0 |
|
``` |
|
Note: run ```crop-video.py --inp driving_video.mp4``` first to get the cropping suggestion and crop the raw video. |
|
|
|
Pretrained Model: |
|
-------- |
|
|
|
Model | Train Set | Baidu Netdisk | Media Fire | |
|
------- |------------ |----------- |-------- | |
|
Vox-256-Beta| VoxCeleb-v1 | [Baidu](https://pan.baidu.com/s/1lLS4ArbK2yWelsL-EtwU8g) (PW: c0tc)| [MF](https://www.mediafire.com/folder/rw51an7tk7bh2/TalkingHead) | |
|
Vox-256-New | VoxCeleb-v1 | - | [MF](https://www.mediafire.com/folder/fcvtkn21j57bb/TalkingHead_Update) | |
|
Vox-512 | VoxCeleb-v2 | soon | soon | |
|
|
|
Note: |
|
1. <s>For now, the Beta Version is not well tuned.</s> |
|
2. For free-view synthesis, it is recommended that Yaw, Pitch and Roll are within ±45°, ±20° and ±20° respectively. |
|
3. Face Restoration algorithms ([GPEN](https://github.com/yangxy/GPEN)) can be used for post-processing to significantly improve the resolution. |
|
 |
|
|
|
|
|
Acknowlegement: |
|
-------- |
|
Thanks to [NV](https://github.com/NVlabs/face-vid2vid), [AliaksandrSiarohin](https://github.com/AliaksandrSiarohin/first-order-model) and [DeepHeadPose](https://github.com/DriverDistraction/DeepHeadPose). |
|
|