Spaces:

ceymox
/

TTS_Streaming-AP

Running

App Files Files Community

TTS_Streaming-AP / README.md

ceymox

Update README.md

aea9dd0 verified 2 months ago

preview code

raw

history blame contribute delete

4.42 kB

	---
	title: TTS Streaming
	emoji: 📈
	colorFrom: gray
	colorTo: red
	sdk: gradio
	sdk_version: 5.25.2
	app_file: app.py
	pinned: false
	license: mit
	---
	# Malayalam TTS with IndicF5

	This application provides a Text-to-Speech (TTS) service for Malayalam language using the IndicF5 model from AI4Bharat. It includes both a FastAPI backend for programmatic access and a Gradio interface for interactive use.

	## Features

	- Malayalam Text-to-Speech conversion
	- Voice cloning from a reference audio
	- Streaming generation for long text
	- Audio quality enhancement
	- Both API and web interface
	- Docker support for easy deployment

	## Installation

	### Option 1: Local Installation

	1. Clone this repository:
	```bash
	git clone https://github.com/yourusername/malayalam-tts.git
	cd malayalam-tts
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. (Optional) Set your Hugging Face token as an environment variable to access gated models:
	```bash
	export HF_TOKEN=your_hugging_face_token
	```

	4. Run the application:
	```bash
	python app.py
	```

	### Option 2: Docker Installation

	1. Build the Docker image:
	```bash
	docker build -t malayalam-tts --build-arg HF_TOKEN=your_hugging_face_token .
	```

	2. Run the container:
	```bash
	docker run -p 8000:8000 malayalam-tts
	```

	## Usage

	### Web Interface

	Access the Gradio web interface at http://localhost:8000/

	1. Enter Malayalam text in the input box
	2. Click "Generate Speech"
	3. Wait for the generation to complete
	4. Listen to or download the generated speech

	### API Endpoints

	The application provides the following API endpoints:

	- `POST /tts`
	- Request body: `{"text": "മലയാളം ടെക്സ്റ്റ്"}`
	- Response: `{"task_id": "unique_id", "message": "TTS generation started"}`

	- `GET /status/{task_id}`
	- Check the status of a generation task
	- Response: `{"status": "processing\|completed\|error", "progress": 75.0}`

	- `GET /audio/{task_id}`
	- Download the generated audio file
	- Returns WAV file when generation is complete

	- `GET /audio/{task_id}/base64`
	- Get the audio as a base64 encoded string
	- Response: `{"audio_base64": "base64_encoded_string"}`

	### Example API Usage

	```python
	import requests
	import time
	import base64
	import json

	# Start TTS generation
	response = requests.post(
	"http://localhost:8000/tts",
	json={"text": "നമസ്കാരം, എങ്ങനെ ഉണ്ട്?"}
	)
	task_id = response.json()["task_id"]

	# Poll until complete
	while True:
	status = requests.get(f"http://localhost:8000/status/{task_id}").json()
	print(f"Status: {status['status']}, Progress: {status.get('progress', 0)}%")

	if status["status"] == "completed":
	break
	elif status["status"] == "error":
	print(f"Error: {status.get('error_message')}")
	break

	time.sleep(1)

	# Download audio
	with open("output.wav", "wb") as f:
	audio = requests.get(f"http://localhost:8000/audio/{task_id}")
	f.write(audio.content)

	print("Audio saved to output.wav")
	```

	## Model Information

	This application uses the [IndicF5](https://huggingface.co/ai4bharat/IndicF5) model from AI4Bharat, which is a text-to-speech model supporting multiple Indic languages including Malayalam.

	## Audio Processing

	The application includes several audio processing techniques to improve quality:
	- Noise reduction
	- Amplitude normalization
	- Gentle compression and limiting
	- Smoothing to reduce artifacts

	## Environment Variables

	- `PORT` - Port for the server (default: 8000)
	- `HF_TOKEN` - Hugging Face token for accessing gated models
	- `HF_HUB_DOWNLOAD_TIMEOUT` - Timeout for model downloads (default: 300 seconds)

	## Troubleshooting

	1. Model loading issues
	- Ensure you have enough disk space for the model (~1.5 GB)
	- Check your internet connection for download issues
	- Provide a valid Hugging Face token if needed

	2. Audio quality issues
	- Try different reference audio files
	- Adjust the text to avoid unusual punctuation
	- Split very long text into smaller chunks

	3. Memory errors
	- Reduce batch sizes or model parameters
	- Use a machine with more RAM or GPU memory

	## License

	This project is licensed under the MIT License - see the LICENSE file for details.
	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference