esl-dialogue-tts / README.md
abocha's picture
ver 2
d48101f
---
title: Dialogue TTS
emoji: ๐Ÿ—ฃ๏ธ๐ŸŽ™๏ธ
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: false
---
# Dialogue Script to Speech Synthesis
This Hugging Face Space converts dialogue scripts into speech using OpenAI's TTS models (`tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`).
## Features
* **Input Script**: Provide a dialogue script with lines in the format `[Speaker] Utterance`.
* **TTS Models**: Choose from `tts-1`, `tts-1-hd`, or `gpt-4o-mini-tts`.
* **Voice Configuration**:
* **Single Global Voice**: Use one voice for all speakers.
* **Random per Speaker**: Assigns a unique random voice to each speaker consistently within a run.
* **A/B Round Robin**: Cycles through available voices for each unique speaker.
* **Detailed Per-Speaker UI**: Configure voice, speed (for `tts-1/hd`), and emotional vibe/custom instructions (for `gpt-4o-mini-tts`) for each speaker individually.
* **Output**:
* A ZIP file containing individual MP3s for each line.
* A single merged MP3 of the entire dialogue with custom pauses.
* **Cost Estimation**: Displays an estimated cost before generating audio.
* **NSFW Check**: Optional content safety check using an external API (if `NSFW_API_URL_TEMPLATE` is configured).
## How to Use
1. **Enter your dialogue script** in the text area.
Example:
```
[Alice] Hello Bob, how are you today?
[Bob] I'm doing great, Alice! Thanks for asking.
[Narrator] And so their conversation began.
```
2. **Select the TTS Model**.
3. **Set the pause duration** (in milliseconds) between lines for the merged audio.
4. **Choose a Speaker Configuration Method**:
* If "Single Voice (Global)", select the voice.
* If "Detailed Configuration...", click "Load/Refresh Per-Speaker Settings UI" and adjust settings for each speaker.
* Other methods will apply voices automatically.
5. (Optional) Adjust **Global Speed** or **Global Instructions** if applicable to your chosen model and configuration.
6. Click **"Calculate Cost"** to see an estimate.
7. Click **"Generate Audio"**.
8. Download the ZIP file or listen to/download the merged MP3.
## Secrets
This Space requires the following secrets to be set in the Hugging Face Space settings:
* `OPENAI_API_KEY`: Your OpenAI API key.
* `NSFW_API_URL_TEMPLATE` (Optional): URL template for NSFW checking, e.g., `https://api.example.com/check?text={text}`. The placeholder `{text}` will be URL-encoded.
* `MODEL_DEFAULT` (Optional): Default TTS model (e.g., `tts-1-hd`).
## Smoke Test Script
Use the following script to test basic functionality:
[Gandalf] You shall not pass!
[Frodo] I will take the Ring to Mordor.
[Gandalf] So be it.
Choose your desired model and settings (e.g., "Random per Speaker"), then generate.
## Deployment
This application is designed to be deployed as a Hugging Face Space.
Ensure `ffmpeg` is available (handled by `container.yaml` for Classic Spaces).
Set the necessary secrets in your Space settings on Hugging Face Hub.