|
--- |
|
title: ParaLip Video Dubbing |
|
emoji: 🎥 |
|
colorFrom: blue |
|
colorTo: purple |
|
sdk: gradio |
|
sdk_version: "4.0.0" |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
|
|
|
# ParaLip Video Dubbing |
|
|
|
This is a Hugging Face Space that provides video dubbing capabilities using the ParaLip model from the [ParaLip paper](https://arxiv.org/abs/2107.06831). The model can generate lip-synchronized videos in multiple languages. |
|
|
|
[](https://arxiv.org/abs/2107.06831) |
|
[](https://github.com/Dianezzy/ParaLip) |
|
|
|
## Features |
|
|
|
- Upload any video file |
|
- Select target language for dubbing |
|
- Generate lip-synchronized dubbed videos |
|
- Support for multiple languages (Spanish, French, German, Italian, Portuguese) |
|
|
|
## How to Use |
|
|
|
1. Upload a video file using the video upload interface |
|
2. Select your desired target language from the dropdown menu |
|
3. Click the "Dub Video" button |
|
4. Wait for the processing to complete |
|
5. Download the dubbed video |
|
|
|
## Technical Details |
|
|
|
The model uses a combination of: |
|
- Video frame processing |
|
- Lip movement prediction |
|
- Language translation |
|
- Audio synthesis |
|
|
|
## Limitations |
|
|
|
- Input videos should be clear and well-lit |
|
- Face should be clearly visible in the video |
|
- Processing time depends on video length |
|
- Maximum video length: 5 minutes |
|
|
|
## Model Information |
|
|
|
This space uses the ParaLip model, which is trained on the TCD-TIMIT dataset. The model architecture is based on FastSpeech and includes: |
|
- Transformer-based encoder-decoder |
|
- Duration predictor |
|
- Lip movement generator |
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite: |
|
```bibtex |
|
@misc{https://doi.org/10.48550/arxiv.2107.06831, |
|
doi = {10.48550/ARXIV.2107.06831}, |
|
url = {https://arxiv.org/abs/2107.06831}, |
|
author = {Liu, Jinglin and Zhu, Zhiying and Ren, Yi and Huang, Wencan and Huai, Baoxing and Yuan, Nicholas and Zhao, Zhou}, |
|
title = {Parallel and High-Fidelity Text-to-Lip Generation}, |
|
publisher = {arXiv}, |
|
year = {2021}, |
|
copyright = {arXiv.org perpetual, non-exclusive license} |
|
} |
|
``` |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License. |
|
|
|
## Acknowledgments |
|
|
|
- TCD-TIMIT dataset |
|
- FastSpeech paper and implementation |
|
- Hugging Face Spaces platform |
|
- Original ParaLip implementation by [Dianezzy](https://github.com/Dianezzy/ParaLip) |
|
|
|
|