|
--- |
|
license: apache-2.0 |
|
language: |
|
- it |
|
- en |
|
- pl |
|
- de |
|
- fr |
|
base_model: |
|
- nari-labs/Dia-1.6B |
|
pipeline_tag: text-to-speech |
|
tags: |
|
- speech |
|
- dia |
|
- text-to-speech |
|
- vocal |
|
- voice |
|
--- |
|
|
|
# Aurora-1.6B: Multilingual Emotion and Singing TTS Model |
|
|
|
A fine-tuned version of Dia-1.6B trained on multilingual and singing datasets, with full emotion control and zero-shot voice cloning. |
|
|
|
## Features |
|
|
|
- **Multilingual Support** |
|
Natural speech in Italian, English, Polish, German, French, and more. |
|
- **Emotion Control** |
|
Use speaker tags or emotion tokens (e.g. `[S1]`, `[happy]`, `[sad]`) to modulate expressiveness. |
|
- **Singing Capabilities** |
|
Generate melodic vocals by providing singing prompts or style references. |
|
- **Zero-Shot Voice Cloning** |
|
Clone any speaker’s voice from a short audio sample. |
|
- **Nonverbal Vocalizations** |
|
Embed realistic effects like `(laughs)`, `(coughs)`, or `(sighs)` inline. |
|
|
|
## Usage |
|
|
|
```python |
|
from dia.model import Dia |
|
import soundfile as sf |
|
|
|
# Load the Aurora-1.6B model |
|
model = Dia.from_pretrained("Lorenzob/aurora-1.6b") |
|
|
|
# Generate a happy spoken line followed by singing |
|
text = "[S1][happy] Hello world! Now sing 'Happy Birthday to You'" |
|
audio = model.generate(text) |
|
|
|
# Save output at 44.1 kHz |
|
sf.write("output.wav", audio, 44100) |
|
|