TTS_Streaming-AP / README.md
ceymox's picture
Update README.md
aea9dd0 verified

A newer version of the Gradio SDK is available: 5.29.0

Upgrade
metadata
title: TTS Streaming
emoji: 📈
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
license: mit

Malayalam TTS with IndicF5

This application provides a Text-to-Speech (TTS) service for Malayalam language using the IndicF5 model from AI4Bharat. It includes both a FastAPI backend for programmatic access and a Gradio interface for interactive use.

Features

  • Malayalam Text-to-Speech conversion
  • Voice cloning from a reference audio
  • Streaming generation for long text
  • Audio quality enhancement
  • Both API and web interface
  • Docker support for easy deployment

Installation

Option 1: Local Installation

  1. Clone this repository:

    git clone https://github.com/yourusername/malayalam-tts.git
    cd malayalam-tts
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. (Optional) Set your Hugging Face token as an environment variable to access gated models:

    export HF_TOKEN=your_hugging_face_token
    
  4. Run the application:

    python app.py
    

Option 2: Docker Installation

  1. Build the Docker image:

    docker build -t malayalam-tts --build-arg HF_TOKEN=your_hugging_face_token .
    
  2. Run the container:

    docker run -p 8000:8000 malayalam-tts
    

Usage

Web Interface

Access the Gradio web interface at http://localhost:8000/

  1. Enter Malayalam text in the input box
  2. Click "Generate Speech"
  3. Wait for the generation to complete
  4. Listen to or download the generated speech

API Endpoints

The application provides the following API endpoints:

  • POST /tts

    • Request body: {"text": "മലയാളം ടെക്സ്റ്റ്"}
    • Response: {"task_id": "unique_id", "message": "TTS generation started"}
  • GET /status/{task_id}

    • Check the status of a generation task
    • Response: {"status": "processing|completed|error", "progress": 75.0}
  • GET /audio/{task_id}

    • Download the generated audio file
    • Returns WAV file when generation is complete
  • GET /audio/{task_id}/base64

    • Get the audio as a base64 encoded string
    • Response: {"audio_base64": "base64_encoded_string"}

Example API Usage

import requests
import time
import base64
import json

# Start TTS generation
response = requests.post(
    "http://localhost:8000/tts",
    json={"text": "നമസ്കാരം, എങ്ങനെ ഉണ്ട്?"}
)
task_id = response.json()["task_id"]

# Poll until complete
while True:
    status = requests.get(f"http://localhost:8000/status/{task_id}").json()
    print(f"Status: {status['status']}, Progress: {status.get('progress', 0)}%")
    
    if status["status"] == "completed":
        break
    elif status["status"] == "error":
        print(f"Error: {status.get('error_message')}")
        break
        
    time.sleep(1)

# Download audio
with open("output.wav", "wb") as f:
    audio = requests.get(f"http://localhost:8000/audio/{task_id}")
    f.write(audio.content)
    
print("Audio saved to output.wav")

Model Information

This application uses the IndicF5 model from AI4Bharat, which is a text-to-speech model supporting multiple Indic languages including Malayalam.

Audio Processing

The application includes several audio processing techniques to improve quality:

  • Noise reduction
  • Amplitude normalization
  • Gentle compression and limiting
  • Smoothing to reduce artifacts

Environment Variables

  • PORT - Port for the server (default: 8000)
  • HF_TOKEN - Hugging Face token for accessing gated models
  • HF_HUB_DOWNLOAD_TIMEOUT - Timeout for model downloads (default: 300 seconds)

Troubleshooting

  1. Model loading issues

    • Ensure you have enough disk space for the model (~1.5 GB)
    • Check your internet connection for download issues
    • Provide a valid Hugging Face token if needed
  2. Audio quality issues

    • Try different reference audio files
    • Adjust the text to avoid unusual punctuation
    • Split very long text into smaller chunks
  3. Memory errors

    • Reduce batch sizes or model parameters
    • Use a machine with more RAM or GPU memory

License

This project is licensed under the MIT License - see the LICENSE file for details. Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference