Spaces:

ceymox
/

TTS_Streaming-AP

Sleeping

File size: 4,422 Bytes

---
title: TTS Streaming
emoji: 📈
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
license: mit
---
# Malayalam TTS with IndicF5

This application provides a Text-to-Speech (TTS) service for Malayalam language using the IndicF5 model from AI4Bharat. It includes both a FastAPI backend for programmatic access and a Gradio interface for interactive use.

## Features

- Malayalam Text-to-Speech conversion
- Voice cloning from a reference audio
- Streaming generation for long text
- Audio quality enhancement
- Both API and web interface
- Docker support for easy deployment

## Installation

### Option 1: Local Installation

1. Clone this repository:
   ```bash
   git clone https://github.com/yourusername/malayalam-tts.git
   cd malayalam-tts
   ```

2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```

3. (Optional) Set your Hugging Face token as an environment variable to access gated models:
   ```bash
   export HF_TOKEN=your_hugging_face_token
   ```

4. Run the application:
   ```bash
   python app.py
   ```

### Option 2: Docker Installation

1. Build the Docker image:
   ```bash
   docker build -t malayalam-tts --build-arg HF_TOKEN=your_hugging_face_token .
   ```

2. Run the container:
   ```bash
   docker run -p 8000:8000 malayalam-tts
   ```

## Usage

### Web Interface

Access the Gradio web interface at http://localhost:8000/

1. Enter Malayalam text in the input box
2. Click "Generate Speech"
3. Wait for the generation to complete
4. Listen to or download the generated speech

### API Endpoints

The application provides the following API endpoints:

- `POST /tts`
  - Request body: `{"text": "മലയാളം ടെക്സ്റ്റ്"}`
  - Response: `{"task_id": "unique_id", "message": "TTS generation started"}`

- `GET /status/{task_id}`
  - Check the status of a generation task
  - Response: `{"status": "processing|completed|error", "progress": 75.0}`

- `GET /audio/{task_id}`
  - Download the generated audio file
  - Returns WAV file when generation is complete

- `GET /audio/{task_id}/base64`
  - Get the audio as a base64 encoded string
  - Response: `{"audio_base64": "base64_encoded_string"}`

### Example API Usage

```python
import requests
import time
import base64
import json

# Start TTS generation
response = requests.post(
    "http://localhost:8000/tts",
    json={"text": "നമസ്കാരം, എങ്ങനെ ഉണ്ട്?"}
)
task_id = response.json()["task_id"]

# Poll until complete
while True:
    status = requests.get(f"http://localhost:8000/status/{task_id}").json()
    print(f"Status: {status['status']}, Progress: {status.get('progress', 0)}%")
    
    if status["status"] == "completed":
        break
    elif status["status"] == "error":
        print(f"Error: {status.get('error_message')}")
        break
        
    time.sleep(1)

# Download audio
with open("output.wav", "wb") as f:
    audio = requests.get(f"http://localhost:8000/audio/{task_id}")
    f.write(audio.content)
    
print("Audio saved to output.wav")
```

## Model Information

This application uses the [IndicF5](https://huggingface.co/ai4bharat/IndicF5) model from AI4Bharat, which is a text-to-speech model supporting multiple Indic languages including Malayalam.

## Audio Processing

The application includes several audio processing techniques to improve quality:
- Noise reduction
- Amplitude normalization
- Gentle compression and limiting
- Smoothing to reduce artifacts

## Environment Variables

- `PORT` - Port for the server (default: 8000)
- `HF_TOKEN` - Hugging Face token for accessing gated models
- `HF_HUB_DOWNLOAD_TIMEOUT` - Timeout for model downloads (default: 300 seconds)

## Troubleshooting

1. **Model loading issues**
   - Ensure you have enough disk space for the model (~1.5 GB)
   - Check your internet connection for download issues
   - Provide a valid Hugging Face token if needed

2. **Audio quality issues**
   - Try different reference audio files
   - Adjust the text to avoid unusual punctuation
   - Split very long text into smaller chunks

3. **Memory errors**
   - Reduce batch sizes or model parameters
   - Use a machine with more RAM or GPU memory

## License

This project is licensed under the MIT License - see the LICENSE file for details.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference