Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.29.0
title: TTS Streaming
emoji: 📈
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
license: mit
Malayalam TTS with IndicF5
This application provides a Text-to-Speech (TTS) service for Malayalam language using the IndicF5 model from AI4Bharat. It includes both a FastAPI backend for programmatic access and a Gradio interface for interactive use.
Features
- Malayalam Text-to-Speech conversion
- Voice cloning from a reference audio
- Streaming generation for long text
- Audio quality enhancement
- Both API and web interface
- Docker support for easy deployment
Installation
Option 1: Local Installation
Clone this repository:
git clone https://github.com/yourusername/malayalam-tts.git cd malayalam-tts
Install dependencies:
pip install -r requirements.txt
(Optional) Set your Hugging Face token as an environment variable to access gated models:
export HF_TOKEN=your_hugging_face_token
Run the application:
python app.py
Option 2: Docker Installation
Build the Docker image:
docker build -t malayalam-tts --build-arg HF_TOKEN=your_hugging_face_token .
Run the container:
docker run -p 8000:8000 malayalam-tts
Usage
Web Interface
Access the Gradio web interface at http://localhost:8000/
- Enter Malayalam text in the input box
- Click "Generate Speech"
- Wait for the generation to complete
- Listen to or download the generated speech
API Endpoints
The application provides the following API endpoints:
POST /tts
- Request body:
{"text": "മലയാളം ടെക്സ്റ്റ്"}
- Response:
{"task_id": "unique_id", "message": "TTS generation started"}
- Request body:
GET /status/{task_id}
- Check the status of a generation task
- Response:
{"status": "processing|completed|error", "progress": 75.0}
GET /audio/{task_id}
- Download the generated audio file
- Returns WAV file when generation is complete
GET /audio/{task_id}/base64
- Get the audio as a base64 encoded string
- Response:
{"audio_base64": "base64_encoded_string"}
Example API Usage
import requests
import time
import base64
import json
# Start TTS generation
response = requests.post(
"http://localhost:8000/tts",
json={"text": "നമസ്കാരം, എങ്ങനെ ഉണ്ട്?"}
)
task_id = response.json()["task_id"]
# Poll until complete
while True:
status = requests.get(f"http://localhost:8000/status/{task_id}").json()
print(f"Status: {status['status']}, Progress: {status.get('progress', 0)}%")
if status["status"] == "completed":
break
elif status["status"] == "error":
print(f"Error: {status.get('error_message')}")
break
time.sleep(1)
# Download audio
with open("output.wav", "wb") as f:
audio = requests.get(f"http://localhost:8000/audio/{task_id}")
f.write(audio.content)
print("Audio saved to output.wav")
Model Information
This application uses the IndicF5 model from AI4Bharat, which is a text-to-speech model supporting multiple Indic languages including Malayalam.
Audio Processing
The application includes several audio processing techniques to improve quality:
- Noise reduction
- Amplitude normalization
- Gentle compression and limiting
- Smoothing to reduce artifacts
Environment Variables
PORT
- Port for the server (default: 8000)HF_TOKEN
- Hugging Face token for accessing gated modelsHF_HUB_DOWNLOAD_TIMEOUT
- Timeout for model downloads (default: 300 seconds)
Troubleshooting
Model loading issues
- Ensure you have enough disk space for the model (~1.5 GB)
- Check your internet connection for download issues
- Provide a valid Hugging Face token if needed
Audio quality issues
- Try different reference audio files
- Adjust the text to avoid unusual punctuation
- Split very long text into smaller chunks
Memory errors
- Reduce batch sizes or model parameters
- Use a machine with more RAM or GPU memory
License
This project is licensed under the MIT License - see the LICENSE file for details. Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference