Spaces:
Sleeping
Sleeping
File size: 4,422 Bytes
2ab0984 aea9dd0 2ab0984 aea9dd0 2ab0984 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
---
title: TTS Streaming
emoji: 📈
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
license: mit
---
# Malayalam TTS with IndicF5
This application provides a Text-to-Speech (TTS) service for Malayalam language using the IndicF5 model from AI4Bharat. It includes both a FastAPI backend for programmatic access and a Gradio interface for interactive use.
## Features
- Malayalam Text-to-Speech conversion
- Voice cloning from a reference audio
- Streaming generation for long text
- Audio quality enhancement
- Both API and web interface
- Docker support for easy deployment
## Installation
### Option 1: Local Installation
1. Clone this repository:
```bash
git clone https://github.com/yourusername/malayalam-tts.git
cd malayalam-tts
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. (Optional) Set your Hugging Face token as an environment variable to access gated models:
```bash
export HF_TOKEN=your_hugging_face_token
```
4. Run the application:
```bash
python app.py
```
### Option 2: Docker Installation
1. Build the Docker image:
```bash
docker build -t malayalam-tts --build-arg HF_TOKEN=your_hugging_face_token .
```
2. Run the container:
```bash
docker run -p 8000:8000 malayalam-tts
```
## Usage
### Web Interface
Access the Gradio web interface at http://localhost:8000/
1. Enter Malayalam text in the input box
2. Click "Generate Speech"
3. Wait for the generation to complete
4. Listen to or download the generated speech
### API Endpoints
The application provides the following API endpoints:
- `POST /tts`
- Request body: `{"text": "മലയാളം ടെക്സ്റ്റ്"}`
- Response: `{"task_id": "unique_id", "message": "TTS generation started"}`
- `GET /status/{task_id}`
- Check the status of a generation task
- Response: `{"status": "processing|completed|error", "progress": 75.0}`
- `GET /audio/{task_id}`
- Download the generated audio file
- Returns WAV file when generation is complete
- `GET /audio/{task_id}/base64`
- Get the audio as a base64 encoded string
- Response: `{"audio_base64": "base64_encoded_string"}`
### Example API Usage
```python
import requests
import time
import base64
import json
# Start TTS generation
response = requests.post(
"http://localhost:8000/tts",
json={"text": "നമസ്കാരം, എങ്ങനെ ഉണ്ട്?"}
)
task_id = response.json()["task_id"]
# Poll until complete
while True:
status = requests.get(f"http://localhost:8000/status/{task_id}").json()
print(f"Status: {status['status']}, Progress: {status.get('progress', 0)}%")
if status["status"] == "completed":
break
elif status["status"] == "error":
print(f"Error: {status.get('error_message')}")
break
time.sleep(1)
# Download audio
with open("output.wav", "wb") as f:
audio = requests.get(f"http://localhost:8000/audio/{task_id}")
f.write(audio.content)
print("Audio saved to output.wav")
```
## Model Information
This application uses the [IndicF5](https://huggingface.co/ai4bharat/IndicF5) model from AI4Bharat, which is a text-to-speech model supporting multiple Indic languages including Malayalam.
## Audio Processing
The application includes several audio processing techniques to improve quality:
- Noise reduction
- Amplitude normalization
- Gentle compression and limiting
- Smoothing to reduce artifacts
## Environment Variables
- `PORT` - Port for the server (default: 8000)
- `HF_TOKEN` - Hugging Face token for accessing gated models
- `HF_HUB_DOWNLOAD_TIMEOUT` - Timeout for model downloads (default: 300 seconds)
## Troubleshooting
1. **Model loading issues**
- Ensure you have enough disk space for the model (~1.5 GB)
- Check your internet connection for download issues
- Provide a valid Hugging Face token if needed
2. **Audio quality issues**
- Try different reference audio files
- Adjust the text to avoid unusual punctuation
- Split very long text into smaller chunks
3. **Memory errors**
- Reduce batch sizes or model parameters
- Use a machine with more RAM or GPU memory
## License
This project is licensed under the MIT License - see the LICENSE file for details.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|