Spaces:
Sleeping
Sleeping
title: TTS Streaming | |
emoji: 📈 | |
colorFrom: gray | |
colorTo: red | |
sdk: gradio | |
sdk_version: 5.25.2 | |
app_file: app.py | |
pinned: false | |
license: mit | |
# Malayalam TTS with IndicF5 | |
This application provides a Text-to-Speech (TTS) service for Malayalam language using the IndicF5 model from AI4Bharat. It includes both a FastAPI backend for programmatic access and a Gradio interface for interactive use. | |
## Features | |
- Malayalam Text-to-Speech conversion | |
- Voice cloning from a reference audio | |
- Streaming generation for long text | |
- Audio quality enhancement | |
- Both API and web interface | |
- Docker support for easy deployment | |
## Installation | |
### Option 1: Local Installation | |
1. Clone this repository: | |
```bash | |
git clone https://github.com/yourusername/malayalam-tts.git | |
cd malayalam-tts | |
``` | |
2. Install dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. (Optional) Set your Hugging Face token as an environment variable to access gated models: | |
```bash | |
export HF_TOKEN=your_hugging_face_token | |
``` | |
4. Run the application: | |
```bash | |
python app.py | |
``` | |
### Option 2: Docker Installation | |
1. Build the Docker image: | |
```bash | |
docker build -t malayalam-tts --build-arg HF_TOKEN=your_hugging_face_token . | |
``` | |
2. Run the container: | |
```bash | |
docker run -p 8000:8000 malayalam-tts | |
``` | |
## Usage | |
### Web Interface | |
Access the Gradio web interface at http://localhost:8000/ | |
1. Enter Malayalam text in the input box | |
2. Click "Generate Speech" | |
3. Wait for the generation to complete | |
4. Listen to or download the generated speech | |
### API Endpoints | |
The application provides the following API endpoints: | |
- `POST /tts` | |
- Request body: `{"text": "മലയാളം ടെക്സ്റ്റ്"}` | |
- Response: `{"task_id": "unique_id", "message": "TTS generation started"}` | |
- `GET /status/{task_id}` | |
- Check the status of a generation task | |
- Response: `{"status": "processing|completed|error", "progress": 75.0}` | |
- `GET /audio/{task_id}` | |
- Download the generated audio file | |
- Returns WAV file when generation is complete | |
- `GET /audio/{task_id}/base64` | |
- Get the audio as a base64 encoded string | |
- Response: `{"audio_base64": "base64_encoded_string"}` | |
### Example API Usage | |
```python | |
import requests | |
import time | |
import base64 | |
import json | |
# Start TTS generation | |
response = requests.post( | |
"http://localhost:8000/tts", | |
json={"text": "നമസ്കാരം, എങ്ങനെ ഉണ്ട്?"} | |
) | |
task_id = response.json()["task_id"] | |
# Poll until complete | |
while True: | |
status = requests.get(f"http://localhost:8000/status/{task_id}").json() | |
print(f"Status: {status['status']}, Progress: {status.get('progress', 0)}%") | |
if status["status"] == "completed": | |
break | |
elif status["status"] == "error": | |
print(f"Error: {status.get('error_message')}") | |
break | |
time.sleep(1) | |
# Download audio | |
with open("output.wav", "wb") as f: | |
audio = requests.get(f"http://localhost:8000/audio/{task_id}") | |
f.write(audio.content) | |
print("Audio saved to output.wav") | |
``` | |
## Model Information | |
This application uses the [IndicF5](https://huggingface.co/ai4bharat/IndicF5) model from AI4Bharat, which is a text-to-speech model supporting multiple Indic languages including Malayalam. | |
## Audio Processing | |
The application includes several audio processing techniques to improve quality: | |
- Noise reduction | |
- Amplitude normalization | |
- Gentle compression and limiting | |
- Smoothing to reduce artifacts | |
## Environment Variables | |
- `PORT` - Port for the server (default: 8000) | |
- `HF_TOKEN` - Hugging Face token for accessing gated models | |
- `HF_HUB_DOWNLOAD_TIMEOUT` - Timeout for model downloads (default: 300 seconds) | |
## Troubleshooting | |
1. **Model loading issues** | |
- Ensure you have enough disk space for the model (~1.5 GB) | |
- Check your internet connection for download issues | |
- Provide a valid Hugging Face token if needed | |
2. **Audio quality issues** | |
- Try different reference audio files | |
- Adjust the text to avoid unusual punctuation | |
- Split very long text into smaller chunks | |
3. **Memory errors** | |
- Reduce batch sizes or model parameters | |
- Use a machine with more RAM or GPU memory | |
## License | |
This project is licensed under the MIT License - see the LICENSE file for details. | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |