File size: 3,048 Bytes
7f1a9e0
d48101f
 
 
 
7f1a9e0
 
 
 
 
d48101f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: Dialogue TTS
emoji: 🗣️🎙️
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: false
---

# Dialogue Script to Speech Synthesis

This Hugging Face Space converts dialogue scripts into speech using OpenAI's TTS models (`tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`).

## Features

*   **Input Script**: Provide a dialogue script with lines in the format `[Speaker] Utterance`.
*   **TTS Models**: Choose from `tts-1`, `tts-1-hd`, or `gpt-4o-mini-tts`.
*   **Voice Configuration**:
    *   **Single Global Voice**: Use one voice for all speakers.
    *   **Random per Speaker**: Assigns a unique random voice to each speaker consistently within a run.
    *   **A/B Round Robin**: Cycles through available voices for each unique speaker.
    *   **Detailed Per-Speaker UI**: Configure voice, speed (for `tts-1/hd`), and emotional vibe/custom instructions (for `gpt-4o-mini-tts`) for each speaker individually.
*   **Output**:
    *   A ZIP file containing individual MP3s for each line.
    *   A single merged MP3 of the entire dialogue with custom pauses.
*   **Cost Estimation**: Displays an estimated cost before generating audio.
*   **NSFW Check**: Optional content safety check using an external API (if `NSFW_API_URL_TEMPLATE` is configured).

## How to Use

1.  **Enter your dialogue script** in the text area.
    Example:
    ```
    [Alice] Hello Bob, how are you today?
    [Bob] I'm doing great, Alice! Thanks for asking.
    [Narrator] And so their conversation began.
    ```
2.  **Select the TTS Model**.
3.  **Set the pause duration** (in milliseconds) between lines for the merged audio.
4.  **Choose a Speaker Configuration Method**:
    *   If "Single Voice (Global)", select the voice.
    *   If "Detailed Configuration...", click "Load/Refresh Per-Speaker Settings UI" and adjust settings for each speaker.
    *   Other methods will apply voices automatically.
5.  (Optional) Adjust **Global Speed** or **Global Instructions** if applicable to your chosen model and configuration.
6.  Click **"Calculate Cost"** to see an estimate.
7.  Click **"Generate Audio"**.
8.  Download the ZIP file or listen to/download the merged MP3.

## Secrets

This Space requires the following secrets to be set in the Hugging Face Space settings:

*   `OPENAI_API_KEY`: Your OpenAI API key.
*   `NSFW_API_URL_TEMPLATE` (Optional): URL template for NSFW checking, e.g., `https://api.example.com/check?text={text}`. The placeholder `{text}` will be URL-encoded.
*   `MODEL_DEFAULT` (Optional): Default TTS model (e.g., `tts-1-hd`).

## Smoke Test Script

Use the following script to test basic functionality:
[Gandalf] You shall not pass!
[Frodo] I will take the Ring to Mordor.
[Gandalf] So be it.

Choose your desired model and settings (e.g., "Random per Speaker"), then generate.

## Deployment

This application is designed to be deployed as a Hugging Face Space.
Ensure `ffmpeg` is available (handled by `container.yaml` for Classic Spaces).
Set the necessary secrets in your Space settings on Hugging Face Hub.