You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

F5-TTS Fine-tuned for Dhivehi (ދިވެހި)

Fine-tuned F5-TTS model for Dhivehi (Maldivian) text-to-speech with zero-shot voice cloning.

Model Details

Architecture: DiT (dim=1024, depth=22, heads=16)
Base Model: F5-TTS v1 Base
Vocoder: Vocos (24kHz)
Tokenizer: Custom character-level (Thaana + Latin + punctuation)
Vocab size: 2604 characters (59 Thaana chars added to base vocab)

Usage

from f5_tts.api import F5TTS

tts = F5TTS(
    model="F5TTS_v1_Base",
    ckpt_file="model.pt",
    vocab_file="vocab.txt",
)

wav, sr, _ = tts.infer(
    ref_file="reference.wav",
    ref_text="reference text in Dhivehi",
    gen_text="ދިވެހިރާއްޖެއަކީ ވަރަކް ރީތި ޔައުމެކެވެ",
)

Training Data

Dataset	Samples
Serialtechlab/dhivehi-mms-v5-combined	~9,660
Serialtechlab/dv-presidential-speech	~1,660
alakxender/dv-audio-syn-lg	~50,000 (synthetic)

Training Config

Learning rate: 1e-05
Batch size: 19200 frames
Epochs: 100
Mixed precision: bf16
GPU: NVIDIA A100 40GB

Files

model.pt - Fine-tuned F5-TTS weights
vocab.txt - Extended character vocabulary (Thaana + base)
config.json - Training configuration

Downloads last month: -

Model tree for Serialtechlab/f5-tts-dhivehi

Base model

SWivid/F5-TTS

Finetuned

(130)

this model

Serialtechlab
/

f5-tts-dhivehi