Spaces:

ceymox
/

TTS_Streaming-AP

Running

App Files Files Community

ceymox commited on May 2

Commit

aea9dd0

verified ·

1 Parent(s): aef5341

Update README.md

Browse files

Files changed (1) hide show

README.md +153 -0

README.md CHANGED Viewed

@@ -9,5 +9,158 @@ app_file: app.py
 pinned: false
 license: mit
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 pinned: false
 license: mit
 ---
+# Malayalam TTS with IndicF5
+This application provides a Text-to-Speech (TTS) service for Malayalam language using the IndicF5 model from AI4Bharat. It includes both a FastAPI backend for programmatic access and a Gradio interface for interactive use.
+## Features
+- Malayalam Text-to-Speech conversion
+- Voice cloning from a reference audio
+- Streaming generation for long text
+- Audio quality enhancement
+- Both API and web interface
+- Docker support for easy deployment
+## Installation
+### Option 1: Local Installation
+1. Clone this repository:
+   ```bash
+   git clone https://github.com/yourusername/malayalam-tts.git
+   cd malayalam-tts
+   ```
+2. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. (Optional) Set your Hugging Face token as an environment variable to access gated models:
+   ```bash
+   export HF_TOKEN=your_hugging_face_token
+   ```
+4. Run the application:
+   ```bash
+   python app.py
+   ```
+### Option 2: Docker Installation
+1. Build the Docker image:
+   ```bash
+   docker build -t malayalam-tts --build-arg HF_TOKEN=your_hugging_face_token .
+   ```
+2. Run the container:
+   ```bash
+   docker run -p 8000:8000 malayalam-tts
+   ```
+## Usage
+### Web Interface
+Access the Gradio web interface at http://localhost:8000/
+1. Enter Malayalam text in the input box
+2. Click "Generate Speech"
+3. Wait for the generation to complete
+4. Listen to or download the generated speech
+### API Endpoints
+The application provides the following API endpoints:
+- `POST /tts`
+  - Request body: `{"text": "മലയാളം ടെക്സ്റ്റ്"}`
+  - Response: `{"task_id": "unique_id", "message": "TTS generation started"}`
+- `GET /status/{task_id}`
+  - Check the status of a generation task
+  - Response: `{"status": "processing|completed|error", "progress": 75.0}`
+- `GET /audio/{task_id}`
+  - Download the generated audio file
+  - Returns WAV file when generation is complete
+- `GET /audio/{task_id}/base64`
+  - Get the audio as a base64 encoded string
+  - Response: `{"audio_base64": "base64_encoded_string"}`
+### Example API Usage
+```python
+import requests
+import time
+import base64
+import json
+# Start TTS generation
+response = requests.post(
+    "http://localhost:8000/tts",
+    json={"text": "നമസ്കാരം, എങ്ങനെ ഉണ്ട്?"}
+)
+task_id = response.json()["task_id"]
+# Poll until complete
+while True:
+    status = requests.get(f"http://localhost:8000/status/{task_id}").json()
+    print(f"Status: {status['status']}, Progress: {status.get('progress', 0)}%")
+    if status["status"] == "completed":
+        break
+    elif status["status"] == "error":
+        print(f"Error: {status.get('error_message')}")
+        break
+    time.sleep(1)
+# Download audio
+with open("output.wav", "wb") as f:
+    audio = requests.get(f"http://localhost:8000/audio/{task_id}")
+    f.write(audio.content)
+print("Audio saved to output.wav")
+```
+## Model Information
+This application uses the [IndicF5](https://huggingface.co/ai4bharat/IndicF5) model from AI4Bharat, which is a text-to-speech model supporting multiple Indic languages including Malayalam.
+## Audio Processing
+The application includes several audio processing techniques to improve quality:
+- Noise reduction
+- Amplitude normalization
+- Gentle compression and limiting
+- Smoothing to reduce artifacts
+## Environment Variables
+- `PORT` - Port for the server (default: 8000)
+- `HF_TOKEN` - Hugging Face token for accessing gated models
+- `HF_HUB_DOWNLOAD_TIMEOUT` - Timeout for model downloads (default: 300 seconds)
+## Troubleshooting
+1. **Model loading issues**
+   - Ensure you have enough disk space for the model (~1.5 GB)
+   - Check your internet connection for download issues
+   - Provide a valid Hugging Face token if needed
+2. **Audio quality issues**
+   - Try different reference audio files
+   - Adjust the text to avoid unusual punctuation
+   - Split very long text into smaller chunks
+3. **Memory errors**
+   - Reduce batch sizes or model parameters
+   - Use a machine with more RAM or GPU memory
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference