Spaces:

zhouzhou363
/

f5-tts

Configuration error

SWivid commited on Oct 14, 2024

Commit

e938b40

1 Parent(s): ddb68ee

add more detailed instruct. on inference. address #49 #50

Files changed (4) hide show

README.md CHANGED Viewed

@@ -57,11 +57,17 @@ Once your datasets are prepared, you can start the training process.
 accelerate config
 accelerate launch test_train.py
 ```
 ## Inference
 To run inference with pretrained models, download the checkpoints from [🤗 Hugging Face](https://huggingface.co/SWivid/F5-TTS).
 ### Single Inference
 You can test single inference using the following command. Before running the command, modify the config up to your need.

 accelerate config
 accelerate launch test_train.py
 ```
+An initial guidance on Finetuning #57.
 ## Inference
 To run inference with pretrained models, download the checkpoints from [🤗 Hugging Face](https://huggingface.co/SWivid/F5-TTS).
+Currently support up to 30s generation, which is the **TOTAL** length of prompt audio and the generated. Batch inference with chunks is supported by Gradio APP now.
+- To avoid inference failure, make sure you have seen through following instructions.
+- Uppercased letters will be uttered letter by letter, so use lowercased letter for normal words.
+- Add some spaces (blank: " ") or punctuations (e.g. "," ".") to explicitly introduce some pauses.
 ### Single Inference
 You can test single inference using the following command. Before running the command, modify the config up to your need.

gradio_app.py CHANGED Viewed

@@ -218,6 +218,8 @@ def infer_batch(ref_audio, ref_text, gen_text_batches, exp_name, remove_silence,
     for i, gen_text in enumerate(progress.tqdm(gen_text_batches)):
         # Prepare the text
         text_list = [ref_text + gen_text]
         final_text_list = convert_char_to_pinyin(text_list)

     for i, gen_text in enumerate(progress.tqdm(gen_text_batches)):
         # Prepare the text
+        if len(ref_text[-1].encode('utf-8')) == 1:
+            ref_text = ref_text + " "
         text_list = [ref_text + gen_text]
         final_text_list = convert_char_to_pinyin(text_list)

model/utils.py CHANGED Viewed

@@ -275,6 +275,8 @@ def get_inference_prompt(
             ref_audio = resampler(ref_audio)
         # Text
         text = [prompt_text + gt_text]
         if tokenizer == "pinyin":
             text_list = convert_char_to_pinyin(text, polyphone = polyphone)

             ref_audio = resampler(ref_audio)
         # Text
+        if len(prompt_text[-1].encode('utf-8')) == 1:
+            prompt_text = prompt_text + " "
         text = [prompt_text + gt_text]
         if tokenizer == "pinyin":
             text_list = convert_char_to_pinyin(text, polyphone = polyphone)

test_infer_single.py CHANGED Viewed

@@ -116,6 +116,8 @@ if sr != target_sample_rate:
 audio = audio.to(device)
 # Text
 text_list = [ref_text + gen_text]
 if tokenizer == "pinyin":
     final_text_list = convert_char_to_pinyin(text_list)

 audio = audio.to(device)
 # Text
+if len(ref_text[-1].encode('utf-8')) == 1:
+    ref_text = ref_text + " "
 text_list = [ref_text + gen_text]
 if tokenizer == "pinyin":
     final_text_list = convert_char_to_pinyin(text_list)