File size: 2,475 Bytes
6041f57
 
 
 
 
 
899d475
6041f57
 
 
 
f098ae1
 
5421a47
 
3f512b1
 
 
 
5421a47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
899d475
5421a47
 
 
 
3f512b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5421a47
 
899d475
5421a47
 
 
899d475
5421a47
 
3f512b1
5421a47
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
title: ParaLip Video Dubbing
emoji: 🎥
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: "4.0.0"
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# ParaLip Video Dubbing

This is a Hugging Face Space that provides video dubbing capabilities using the ParaLip model from the [ParaLip paper](https://arxiv.org/abs/2107.06831). The model can generate lip-synchronized videos in multiple languages.

[![arXiv](https://img.shields.io/badge/arXiv-Paper-blue.svg)](https://arxiv.org/abs/2107.06831)
[![GitHub Stars](https://img.shields.io/github/stars/Dianezzy/ParaLip?style=social)](https://github.com/Dianezzy/ParaLip)

## Features

- Upload any video file
- Select target language for dubbing
- Generate lip-synchronized dubbed videos
- Support for multiple languages (Spanish, French, German, Italian, Portuguese)

## How to Use

1. Upload a video file using the video upload interface
2. Select your desired target language from the dropdown menu
3. Click the "Dub Video" button
4. Wait for the processing to complete
5. Download the dubbed video

## Technical Details

The model uses a combination of:
- Video frame processing
- Lip movement prediction
- Language translation
- Audio synthesis

## Limitations

- Input videos should be clear and well-lit
- Face should be clearly visible in the video
- Processing time depends on video length
- Maximum video length: 5 minutes

## Model Information

This space uses the ParaLip model, which is trained on the TCD-TIMIT dataset. The model architecture is based on FastSpeech and includes:
- Transformer-based encoder-decoder
- Duration predictor
- Lip movement generator

## Citation

If you use this model in your research, please cite:
```bibtex
@misc{https://doi.org/10.48550/arxiv.2107.06831,
  doi = {10.48550/ARXIV.2107.06831},
  url = {https://arxiv.org/abs/2107.06831},
  author = {Liu, Jinglin and Zhu, Zhiying and Ren, Yi and Huang, Wencan and Huai, Baoxing and Yuan, Nicholas and Zhao, Zhou},
  title = {Parallel and High-Fidelity Text-to-Lip Generation},
  publisher = {arXiv},
  year = {2021},
  copyright = {arXiv.org perpetual, non-exclusive license}
}
```

## License

This project is licensed under the MIT License.

## Acknowledgments

- TCD-TIMIT dataset
- FastSpeech paper and implementation
- Hugging Face Spaces platform
- Original ParaLip implementation by [Dianezzy](https://github.com/Dianezzy/ParaLip)