teachingAssistant / utils /tts_README.md
Michael Hu
refactor tts module
7495571
|
raw
history blame
1.89 kB

TTS Structure

This directory contains a Text-to-Speech (TTS) implementation that supports three specific models:

  1. Kokoro: https://github.com/hexgrad/kokoro
  2. Dia: https://github.com/nari-labs/dia
  3. CosyVoice2: https://github.com/nari-labs/dia

Structure

The TTS implementation follows a simple, clean structure:

  • tts.py: Contains the base TTSBase abstract class and DummyTTS implementation
  • tts_kokoro.py: Kokoro TTS implementation
  • tts_dia.py: Dia TTS implementation
  • tts_cosyvoice2.py: CosyVoice2 TTS implementation
  • tts_main.py: Main entry point for TTS functionality

Usage

# Import the main TTS functions
from utils.tts_main import generate_speech, generate_speech_stream, get_tts_engine

# Generate speech using the best available engine
audio_path = generate_speech("Hello, world!")

# Generate speech using a specific engine
audio_path = generate_speech("Hello, world!", engine_type="kokoro")

# Generate speech with specific parameters
audio_path = generate_speech(
    "Hello, world!",
    engine_type="dia",
    lang_code="en",
    voice="default",
    speed=1.0
)

# Generate speech stream
for sample_rate, audio_data in generate_speech_stream("Hello, world!"):
    # Process audio data
    pass

# Get a specific TTS engine instance
engine = get_tts_engine("kokoro")
audio_path = engine.generate_speech("Hello, world!")

Error Handling

All TTS implementations include robust error handling:

  1. Each implementation checks for the availability of its dependencies
  2. If a specific engine fails, it automatically falls back to the DummyTTS implementation
  3. The main module prioritizes engines based on availability

Adding New Engines

To add a new TTS engine:

  1. Create a new file tts_<engine_name>.py
  2. Implement a class that inherits from TTSBase
  3. Add the engine to the available engines list in tts_main.py