metadata

title: TTS Streaming
emoji: 📈
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
license: mit

Malayalam TTS with IndicF5

This application provides a Text-to-Speech (TTS) service for Malayalam language using the IndicF5 model from AI4Bharat. It includes both a FastAPI backend for programmatic access and a Gradio interface for interactive use.

Features

Malayalam Text-to-Speech conversion
Voice cloning from a reference audio
Streaming generation for long text
Audio quality enhancement
Both API and web interface
Docker support for easy deployment

Installation

Option 1: Local Installation

Clone this repository:

git clone https://github.com/yourusername/malayalam-tts.git
cd malayalam-tts

Install dependencies:
```
pip install -r requirements.txt
```
(Optional) Set your Hugging Face token as an environment variable to access gated models:
```
export HF_TOKEN=your_hugging_face_token
```
Run the application:
```
python app.py
```

Option 2: Docker Installation

Build the Docker image:

docker build -t malayalam-tts --build-arg HF_TOKEN=your_hugging_face_token .

Run the container:
```
docker run -p 8000:8000 malayalam-tts
```

Usage

Web Interface

Access the Gradio web interface at http://localhost:8000/

Enter Malayalam text in the input box
Click "Generate Speech"
Wait for the generation to complete
Listen to or download the generated speech

API Endpoints

The application provides the following API endpoints:

POST /tts
- Request body: {"text": "മലയാളം ടെക്സ്റ്റ്"}
- Response: {"task_id": "unique_id", "message": "TTS generation started"}
GET /status/{task_id}
- Check the status of a generation task
- Response: {"status": "processing|completed|error", "progress": 75.0}
GET /audio/{task_id}
- Download the generated audio file
- Returns WAV file when generation is complete
GET /audio/{task_id}/base64
- Get the audio as a base64 encoded string
- Response: {"audio_base64": "base64_encoded_string"}

Example API Usage

import requests
import time
import base64
import json

# Start TTS generation
response = requests.post(
    "http://localhost:8000/tts",
    json={"text": "നമസ്കാരം, എങ്ങനെ ഉണ്ട്?"}
)
task_id = response.json()["task_id"]

# Poll until complete
while True:
    status = requests.get(f"http://localhost:8000/status/{task_id}").json()
    print(f"Status: {status['status']}, Progress: {status.get('progress', 0)}%")
    
    if status["status"] == "completed":
        break
    elif status["status"] == "error":
        print(f"Error: {status.get('error_message')}")
        break
        
    time.sleep(1)

# Download audio
with open("output.wav", "wb") as f:
    audio = requests.get(f"http://localhost:8000/audio/{task_id}")
    f.write(audio.content)
    
print("Audio saved to output.wav")

Model Information

This application uses the IndicF5 model from AI4Bharat, which is a text-to-speech model supporting multiple Indic languages including Malayalam.

Audio Processing

The application includes several audio processing techniques to improve quality:

Noise reduction
Amplitude normalization
Gentle compression and limiting
Smoothing to reduce artifacts

Environment Variables

PORT - Port for the server (default: 8000)
HF_TOKEN - Hugging Face token for accessing gated models
HF_HUB_DOWNLOAD_TIMEOUT - Timeout for model downloads (default: 300 seconds)

Troubleshooting

Model loading issues
- Ensure you have enough disk space for the model (~1.5 GB)
- Check your internet connection for download issues
- Provide a valid Hugging Face token if needed
Audio quality issues
- Try different reference audio files
- Adjust the text to avoid unusual punctuation
- Split very long text into smaller chunks
Memory errors
- Reduce batch sizes or model parameters
- Use a machine with more RAM or GPU memory

License

This project is licensed under the MIT License - see the LICENSE file for details. Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference