{% if error %} {% endif %} {% if success %} {% endif %}

Generate Speech with Dia

Use [S1] and [S2] tags for speaker turns. Add non-verbals like (laughs).

0 / 8192
{% if presets %} {% for preset in presets %} {% endfor %} {% else %}

No presets loaded. Check presets.yaml.

{% endif %}
Generation Parameters
{% set current_gen_params = submitted_gen_params if submitted_gen_params else default_gen_params %}
Server Configuration

These settings are saved to the .env file. Restart the server to apply changes.

{% if output_file_url %}

Generated Audio

Mode: {{ submitted_voice_mode }} {% if submitted_voice_mode == 'clone' and submitted_clone_file %} ({{ submitted_clone_file }}) {% endif %} • Gen Time: {{ generation_time }}s • Duration: --:--
{% endif %}

Tips & Tricks for Dia

  • For **Dialogue** mode, clearly mark speaker turns using [S1] and [S2].
  • Add non-verbal sounds like (laughs), (sighs), (clears throat) within the text where desired.
  • For **Voice Clone** mode, upload a clean reference audio file (.wav/.mp3) using the "Load" button. Crucially, include the exact transcript of the reference audio at the beginning of your text input (e.g., [S1] Reference transcript. [S1] Target text...).
  • Experiment with **CFG Scale** (higher = more adherence to text, potentially less natural) and **Temperature** (higher = more random/varied).
  • The **Speed Factor** adjusts playback speed (0.8 = slower, 1.0 = original).
  • Use the /v1/audio/speech endpoint for OpenAI compatibility. Use the voice parameter to specify mode ('S1', 'S2', 'dialogue', 'reference_file.wav').