suprimedev commited on
Commit
94c2b30
·
verified ·
1 Parent(s): fb8848d

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +2 -12
app.py CHANGED
@@ -251,18 +251,8 @@ def voice_clone(text_audio_path, voice_audio_path):
251
  speaker_embeddings = speaker_embeddings.to("cuda")
252
 
253
  print("Generating speech...")
254
- 🐸 SpeechT5 doesn't provide a direct way to extract x-vectors from an arbitrary audio file.
255
- The 'speaker_embeddings' in examples are usually pre_extracted or comes from dataset
256
- FOR A ROBUST SOLUTION using custom audio, you need a separate model like pyannote.audio or SpeechBrain
257
- This part of the code assumes that you have already extracted *speaker_embeddings* (x-vector) from the second audio file,
258
- which contains the voice you want to clone. If not, it will use a a generic pre-defined embedding or raise error.
259
- This is the trickiest part for direct voice cloning with arbitrary audio using SpeechT5.
260
-
261
- For this demo, we'll implement both:
262
- 1. Basic version with predefined speaker embedding (simpler, less true cloning).
263
- 2. Advanced version with SpeechBrain for speaker embedding extraction (more accurate cloning).
264
- Let's go with the advanced version to meet the "low error" requirement for cloning.
265
- ```
266
 
267
  # Generate speech
268
  speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
 
251
  speaker_embeddings = speaker_embeddings.to("cuda")
252
 
253
  print("Generating speech...")
254
+
255
+
 
 
 
 
 
 
 
 
 
 
256
 
257
  # Generate speech
258
  speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)