Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.42.0
metadata
title: Dialogue TTS
emoji: 🗣️🎙️
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: false
Dialogue Script to Speech Synthesis
This Hugging Face Space converts dialogue scripts into speech using OpenAI's TTS models (tts-1
, tts-1-hd
, gpt-4o-mini-tts
).
Features
- Input Script: Provide a dialogue script with lines in the format
[Speaker] Utterance
. - TTS Models: Choose from
tts-1
,tts-1-hd
, orgpt-4o-mini-tts
. - Voice Configuration:
- Single Global Voice: Use one voice for all speakers.
- Random per Speaker: Assigns a unique random voice to each speaker consistently within a run.
- A/B Round Robin: Cycles through available voices for each unique speaker.
- Detailed Per-Speaker UI: Configure voice, speed (for
tts-1/hd
), and emotional vibe/custom instructions (forgpt-4o-mini-tts
) for each speaker individually.
- Output:
- A ZIP file containing individual MP3s for each line.
- A single merged MP3 of the entire dialogue with custom pauses.
- Cost Estimation: Displays an estimated cost before generating audio.
- NSFW Check: Optional content safety check using an external API (if
NSFW_API_URL_TEMPLATE
is configured).
How to Use
- Enter your dialogue script in the text area.
Example:
[Alice] Hello Bob, how are you today? [Bob] I'm doing great, Alice! Thanks for asking. [Narrator] And so their conversation began.
- Select the TTS Model.
- Set the pause duration (in milliseconds) between lines for the merged audio.
- Choose a Speaker Configuration Method:
- If "Single Voice (Global)", select the voice.
- If "Detailed Configuration...", click "Load/Refresh Per-Speaker Settings UI" and adjust settings for each speaker.
- Other methods will apply voices automatically.
- (Optional) Adjust Global Speed or Global Instructions if applicable to your chosen model and configuration.
- Click "Calculate Cost" to see an estimate.
- Click "Generate Audio".
- Download the ZIP file or listen to/download the merged MP3.
Secrets
This Space requires the following secrets to be set in the Hugging Face Space settings:
OPENAI_API_KEY
: Your OpenAI API key.NSFW_API_URL_TEMPLATE
(Optional): URL template for NSFW checking, e.g.,https://api.example.com/check?text={text}
. The placeholder{text}
will be URL-encoded.MODEL_DEFAULT
(Optional): Default TTS model (e.g.,tts-1-hd
).
Smoke Test Script
Use the following script to test basic functionality: [Gandalf] You shall not pass! [Frodo] I will take the Ring to Mordor. [Gandalf] So be it.
Choose your desired model and settings (e.g., "Random per Speaker"), then generate.
Deployment
This application is designed to be deployed as a Hugging Face Space.
Ensure ffmpeg
is available (handled by container.yaml
for Classic Spaces).
Set the necessary secrets in your Space settings on Hugging Face Hub.