Spaces:
Build error
Build error
TTS Structure
This directory contains a Text-to-Speech (TTS) implementation that supports three specific models:
- Kokoro: https://github.com/hexgrad/kokoro
- Dia: https://github.com/nari-labs/dia
- CosyVoice2: https://github.com/nari-labs/dia
Structure
The TTS implementation follows a simple, clean structure:
tts.py
: Contains the baseTTSBase
abstract class andDummyTTS
implementationtts_kokoro.py
: Kokoro TTS implementationtts_dia.py
: Dia TTS implementationtts_cosyvoice2.py
: CosyVoice2 TTS implementationtts_main.py
: Main entry point for TTS functionality
Usage
# Import the main TTS functions
from utils.tts_main import generate_speech, generate_speech_stream, get_tts_engine
# Generate speech using the best available engine
audio_path = generate_speech("Hello, world!")
# Generate speech using a specific engine
audio_path = generate_speech("Hello, world!", engine_type="kokoro")
# Generate speech with specific parameters
audio_path = generate_speech(
"Hello, world!",
engine_type="dia",
lang_code="en",
voice="default",
speed=1.0
)
# Generate speech stream
for sample_rate, audio_data in generate_speech_stream("Hello, world!"):
# Process audio data
pass
# Get a specific TTS engine instance
engine = get_tts_engine("kokoro")
audio_path = engine.generate_speech("Hello, world!")
Error Handling
All TTS implementations include robust error handling:
- Each implementation checks for the availability of its dependencies
- If a specific engine fails, it automatically falls back to the
DummyTTS
implementation - The main module prioritizes engines based on availability
Adding New Engines
To add a new TTS engine:
- Create a new file
tts_<engine_name>.py
- Implement a class that inherits from
TTSBase
- Add the engine to the available engines list in
tts_main.py