ceymox commited on
Commit
aea9dd0
·
verified ·
1 Parent(s): aef5341

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -0
README.md CHANGED
@@ -9,5 +9,158 @@ app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
 
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
9
  pinned: false
10
  license: mit
11
  ---
12
+ # Malayalam TTS with IndicF5
13
 
14
+ This application provides a Text-to-Speech (TTS) service for Malayalam language using the IndicF5 model from AI4Bharat. It includes both a FastAPI backend for programmatic access and a Gradio interface for interactive use.
15
+
16
+ ## Features
17
+
18
+ - Malayalam Text-to-Speech conversion
19
+ - Voice cloning from a reference audio
20
+ - Streaming generation for long text
21
+ - Audio quality enhancement
22
+ - Both API and web interface
23
+ - Docker support for easy deployment
24
+
25
+ ## Installation
26
+
27
+ ### Option 1: Local Installation
28
+
29
+ 1. Clone this repository:
30
+ ```bash
31
+ git clone https://github.com/yourusername/malayalam-tts.git
32
+ cd malayalam-tts
33
+ ```
34
+
35
+ 2. Install dependencies:
36
+ ```bash
37
+ pip install -r requirements.txt
38
+ ```
39
+
40
+ 3. (Optional) Set your Hugging Face token as an environment variable to access gated models:
41
+ ```bash
42
+ export HF_TOKEN=your_hugging_face_token
43
+ ```
44
+
45
+ 4. Run the application:
46
+ ```bash
47
+ python app.py
48
+ ```
49
+
50
+ ### Option 2: Docker Installation
51
+
52
+ 1. Build the Docker image:
53
+ ```bash
54
+ docker build -t malayalam-tts --build-arg HF_TOKEN=your_hugging_face_token .
55
+ ```
56
+
57
+ 2. Run the container:
58
+ ```bash
59
+ docker run -p 8000:8000 malayalam-tts
60
+ ```
61
+
62
+ ## Usage
63
+
64
+ ### Web Interface
65
+
66
+ Access the Gradio web interface at http://localhost:8000/
67
+
68
+ 1. Enter Malayalam text in the input box
69
+ 2. Click "Generate Speech"
70
+ 3. Wait for the generation to complete
71
+ 4. Listen to or download the generated speech
72
+
73
+ ### API Endpoints
74
+
75
+ The application provides the following API endpoints:
76
+
77
+ - `POST /tts`
78
+ - Request body: `{"text": "മലയാളം ടെക്സ്റ്റ്"}`
79
+ - Response: `{"task_id": "unique_id", "message": "TTS generation started"}`
80
+
81
+ - `GET /status/{task_id}`
82
+ - Check the status of a generation task
83
+ - Response: `{"status": "processing|completed|error", "progress": 75.0}`
84
+
85
+ - `GET /audio/{task_id}`
86
+ - Download the generated audio file
87
+ - Returns WAV file when generation is complete
88
+
89
+ - `GET /audio/{task_id}/base64`
90
+ - Get the audio as a base64 encoded string
91
+ - Response: `{"audio_base64": "base64_encoded_string"}`
92
+
93
+ ### Example API Usage
94
+
95
+ ```python
96
+ import requests
97
+ import time
98
+ import base64
99
+ import json
100
+
101
+ # Start TTS generation
102
+ response = requests.post(
103
+ "http://localhost:8000/tts",
104
+ json={"text": "നമസ്കാരം, എങ്ങനെ ഉണ്ട്?"}
105
+ )
106
+ task_id = response.json()["task_id"]
107
+
108
+ # Poll until complete
109
+ while True:
110
+ status = requests.get(f"http://localhost:8000/status/{task_id}").json()
111
+ print(f"Status: {status['status']}, Progress: {status.get('progress', 0)}%")
112
+
113
+ if status["status"] == "completed":
114
+ break
115
+ elif status["status"] == "error":
116
+ print(f"Error: {status.get('error_message')}")
117
+ break
118
+
119
+ time.sleep(1)
120
+
121
+ # Download audio
122
+ with open("output.wav", "wb") as f:
123
+ audio = requests.get(f"http://localhost:8000/audio/{task_id}")
124
+ f.write(audio.content)
125
+
126
+ print("Audio saved to output.wav")
127
+ ```
128
+
129
+ ## Model Information
130
+
131
+ This application uses the [IndicF5](https://huggingface.co/ai4bharat/IndicF5) model from AI4Bharat, which is a text-to-speech model supporting multiple Indic languages including Malayalam.
132
+
133
+ ## Audio Processing
134
+
135
+ The application includes several audio processing techniques to improve quality:
136
+ - Noise reduction
137
+ - Amplitude normalization
138
+ - Gentle compression and limiting
139
+ - Smoothing to reduce artifacts
140
+
141
+ ## Environment Variables
142
+
143
+ - `PORT` - Port for the server (default: 8000)
144
+ - `HF_TOKEN` - Hugging Face token for accessing gated models
145
+ - `HF_HUB_DOWNLOAD_TIMEOUT` - Timeout for model downloads (default: 300 seconds)
146
+
147
+ ## Troubleshooting
148
+
149
+ 1. **Model loading issues**
150
+ - Ensure you have enough disk space for the model (~1.5 GB)
151
+ - Check your internet connection for download issues
152
+ - Provide a valid Hugging Face token if needed
153
+
154
+ 2. **Audio quality issues**
155
+ - Try different reference audio files
156
+ - Adjust the text to avoid unusual punctuation
157
+ - Split very long text into smaller chunks
158
+
159
+ 3. **Memory errors**
160
+ - Reduce batch sizes or model parameters
161
+ - Use a machine with more RAM or GPU memory
162
+
163
+ ## License
164
+
165
+ This project is licensed under the MIT License - see the LICENSE file for details.
166
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference