Dia TTS Server | Text-to-Dialogue

{% if error %}

{% endif %} {% if success %}

{% endif %}

Generate Speech with Dia

Text to speak

Use [S1] and [S2] tags for speaker turns. Add non-verbals like (laughs).

0 / 8192

Voice Mode

Single / Dialogue (Use [S1]/[S2]) Voice Clone (from Reference)

Load Example Preset

{% if presets %} {% for preset in presets %} {% endfor %} {% else %}

No presets loaded. Check presets.yaml.

{% endif %}

Generation Parameters

{% set current_gen_params = submitted_gen_params if submitted_gen_params else default_gen_params %}

Speed Factor ({{ current_gen_params.speed_factor }})

CFG Scale ({{ current_gen_params.cfg_scale }})

Temperature ({{ current_gen_params.temperature }})

Top P ({{ current_gen_params.top_p }})

CFG Filter Top K ({{ current_gen_params.cfg_filter_top_k }})

Server Configuration

These settings are saved to the .env file. Restart the server to apply changes.

Model Repo ID

Model Config Filename

Model Weights Filename

Model Cache Path

Reference Audio Path

Output Path

Server Host

Server Port

Use [S1]/[S2] for dialogue. Add (laughs) etc.

{% if output_file_url %}

Generated Audio

Download WAV

Mode: {{ submitted_voice_mode }} {% if submitted_voice_mode == 'clone' and submitted_clone_file %} ({{ submitted_clone_file }}) {% endif %} • Gen Time: {{ generation_time }}s • Duration: --:--

{% endif %}

Tips & Tricks for Dia

For **Dialogue** mode, clearly mark speaker turns using [S1] and [S2].
Add non-verbal sounds like (laughs), (sighs), (clears throat) within the text where desired.
For **Voice Clone** mode, upload a clean reference audio file (.wav/.mp3) using the "Load" button. Crucially, include the exact transcript of the reference audio at the beginning of your text input (e.g., [S1] Reference transcript. [S1] Target text...).
Experiment with **CFG Scale** (higher = more adherence to text, potentially less natural) and **Temperature** (higher = more random/varied).
The **Speed Factor** adjusts playback speed (0.8 = slower, 1.0 = original).
Use the /v1/audio/speech endpoint for OpenAI compatibility. Use the voice parameter to specify mode ('S1', 'S2', 'dialogue', 'reference_file.wav').