File size: 1,307 Bytes
d924c46
 
 
d2f3049
 
 
 
 
d924c46
d2f3049
d924c46
 
d2f3049
 
 
 
 
d924c46
5569a44
d2f3049
 
 
5569a44
 
 
d2f3049
 
 
 
 
 
 
 
 
 
5569a44
 
 
 
 
d2f3049
5569a44
d2f3049
5569a44
 
d2f3049
 
 
5569a44
d2f3049
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
license: apache-2.0
language:
  - it
  - en
  - pl
  - de
  - fr
base_model:
  - nari-labs/Dia-1.6B
pipeline_tag: text-to-speech
tags:
  - speech
  - dia
  - text-to-speech
  - vocal
  - voice
---

# Aurora-1.6B: Multilingual Emotion and Singing TTS Model

A fine-tuned version of Dia-1.6B trained on multilingual and singing datasets, with full emotion control and zero-shot voice cloning.

## Features

- **Multilingual Support**  
  Natural speech in Italian, English, Polish, German, French, and more.  
- **Emotion Control**  
  Use speaker tags or emotion tokens (e.g. `[S1]`, `[happy]`, `[sad]`) to modulate expressiveness.  
- **Singing Capabilities**  
  Generate melodic vocals by providing singing prompts or style references.  
- **Zero-Shot Voice Cloning**  
  Clone any speaker’s voice from a short audio sample.  
- **Nonverbal Vocalizations**  
  Embed realistic effects like `(laughs)`, `(coughs)`, or `(sighs)` inline.

## Usage

```python
from dia.model import Dia
import soundfile as sf

# Load the Aurora-1.6B model
model = Dia.from_pretrained("Lorenzob/aurora-1.6b")

# Generate a happy spoken line followed by singing
text = "[S1][happy] Hello world! Now sing 'Happy Birthday to You'"
audio = model.generate(text)

# Save output at 44.1 kHz
sf.write("output.wav", audio, 44100)