Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
@@ -251,18 +251,8 @@ def voice_clone(text_audio_path, voice_audio_path):
|
|
251 |
speaker_embeddings = speaker_embeddings.to("cuda")
|
252 |
|
253 |
print("Generating speech...")
|
254 |
-
|
255 |
-
|
256 |
-
FOR A ROBUST SOLUTION using custom audio, you need a separate model like pyannote.audio or SpeechBrain
|
257 |
-
This part of the code assumes that you have already extracted *speaker_embeddings* (x-vector) from the second audio file,
|
258 |
-
which contains the voice you want to clone. If not, it will use a a generic pre-defined embedding or raise error.
|
259 |
-
This is the trickiest part for direct voice cloning with arbitrary audio using SpeechT5.
|
260 |
-
|
261 |
-
For this demo, we'll implement both:
|
262 |
-
1. Basic version with predefined speaker embedding (simpler, less true cloning).
|
263 |
-
2. Advanced version with SpeechBrain for speaker embedding extraction (more accurate cloning).
|
264 |
-
Let's go with the advanced version to meet the "low error" requirement for cloning.
|
265 |
-
```
|
266 |
|
267 |
# Generate speech
|
268 |
speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
|
|
|
251 |
speaker_embeddings = speaker_embeddings.to("cuda")
|
252 |
|
253 |
print("Generating speech...")
|
254 |
+
|
255 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
256 |
|
257 |
# Generate speech
|
258 |
speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
|