File size: 4,422 Bytes
2ab0984
 
 
 
 
 
 
 
 
 
 
aea9dd0
2ab0984
aea9dd0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2ab0984
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
title: TTS Streaming
emoji: 📈
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
license: mit
---
# Malayalam TTS with IndicF5

This application provides a Text-to-Speech (TTS) service for Malayalam language using the IndicF5 model from AI4Bharat. It includes both a FastAPI backend for programmatic access and a Gradio interface for interactive use.

## Features

- Malayalam Text-to-Speech conversion
- Voice cloning from a reference audio
- Streaming generation for long text
- Audio quality enhancement
- Both API and web interface
- Docker support for easy deployment

## Installation

### Option 1: Local Installation

1. Clone this repository:
   ```bash
   git clone https://github.com/yourusername/malayalam-tts.git
   cd malayalam-tts
   ```

2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```

3. (Optional) Set your Hugging Face token as an environment variable to access gated models:
   ```bash
   export HF_TOKEN=your_hugging_face_token
   ```

4. Run the application:
   ```bash
   python app.py
   ```

### Option 2: Docker Installation

1. Build the Docker image:
   ```bash
   docker build -t malayalam-tts --build-arg HF_TOKEN=your_hugging_face_token .
   ```

2. Run the container:
   ```bash
   docker run -p 8000:8000 malayalam-tts
   ```

## Usage

### Web Interface

Access the Gradio web interface at http://localhost:8000/

1. Enter Malayalam text in the input box
2. Click "Generate Speech"
3. Wait for the generation to complete
4. Listen to or download the generated speech

### API Endpoints

The application provides the following API endpoints:

- `POST /tts`
  - Request body: `{"text": "മലയാളം ടെക്സ്റ്റ്"}`
  - Response: `{"task_id": "unique_id", "message": "TTS generation started"}`

- `GET /status/{task_id}`
  - Check the status of a generation task
  - Response: `{"status": "processing|completed|error", "progress": 75.0}`

- `GET /audio/{task_id}`
  - Download the generated audio file
  - Returns WAV file when generation is complete

- `GET /audio/{task_id}/base64`
  - Get the audio as a base64 encoded string
  - Response: `{"audio_base64": "base64_encoded_string"}`

### Example API Usage

```python
import requests
import time
import base64
import json

# Start TTS generation
response = requests.post(
    "http://localhost:8000/tts",
    json={"text": "നമസ്കാരം, എങ്ങനെ ഉണ്ട്?"}
)
task_id = response.json()["task_id"]

# Poll until complete
while True:
    status = requests.get(f"http://localhost:8000/status/{task_id}").json()
    print(f"Status: {status['status']}, Progress: {status.get('progress', 0)}%")
    
    if status["status"] == "completed":
        break
    elif status["status"] == "error":
        print(f"Error: {status.get('error_message')}")
        break
        
    time.sleep(1)

# Download audio
with open("output.wav", "wb") as f:
    audio = requests.get(f"http://localhost:8000/audio/{task_id}")
    f.write(audio.content)
    
print("Audio saved to output.wav")
```

## Model Information

This application uses the [IndicF5](https://huggingface.co/ai4bharat/IndicF5) model from AI4Bharat, which is a text-to-speech model supporting multiple Indic languages including Malayalam.

## Audio Processing

The application includes several audio processing techniques to improve quality:
- Noise reduction
- Amplitude normalization
- Gentle compression and limiting
- Smoothing to reduce artifacts

## Environment Variables

- `PORT` - Port for the server (default: 8000)
- `HF_TOKEN` - Hugging Face token for accessing gated models
- `HF_HUB_DOWNLOAD_TIMEOUT` - Timeout for model downloads (default: 300 seconds)

## Troubleshooting

1. **Model loading issues**
   - Ensure you have enough disk space for the model (~1.5 GB)
   - Check your internet connection for download issues
   - Provide a valid Hugging Face token if needed

2. **Audio quality issues**
   - Try different reference audio files
   - Adjust the text to avoid unusual punctuation
   - Split very long text into smaller chunks

3. **Memory errors**
   - Reduce batch sizes or model parameters
   - Use a machine with more RAM or GPU memory

## License

This project is licensed under the MIT License - see the LICENSE file for details.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference