Whisper Medium ar

This model is a fine-tuned version of openai/whisper-medium on the Common Voice 17.0 dataset. It achieves the following results on the evaluation set:

Loss: 0.2149
Wer: 20.4679
Cer: 5.6352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.04
training_steps: 18000

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
0.4929	0.0556	1000	0.3300	28.9234	9.0009
0.2883	0.1111	2000	0.2984	27.7612	7.8800
0.142	0.1667	3000	0.2847	25.8332	7.5636
0.0746	0.2222	4000	0.2812	25.1152	7.3684
0.0501	0.2778	5000	0.2702	24.9463	7.1645
0.0421	0.3333	6000	0.2640	24.9610	7.1298
0.0292	0.3889	7000	0.2574	23.3984	6.6850
0.0291	0.4444	8000	0.2575	23.1523	6.5031
0.0216	0.5	9000	0.2555	24.4983	6.7680
0.0179	0.5556	10000	0.2440	22.4142	6.1291
0.0166	0.6111	11000	0.2416	21.7183	6.0801
0.0104	0.6667	12000	0.2405	22.0525	6.1413
0.0107	0.7222	13000	0.2457	22.5336	6.1634
0.01	0.7778	14000	0.2374	21.2758	5.8735
0.0155	0.8333	15000	0.2317	22.0727	5.9926
0.0081	0.8889	16000	0.2285	20.8296	5.7606
0.0051	0.9444	17000	0.2250	20.7121	5.6673
0.0067	1.0	18000	0.2149	20.4679	5.6352

Framework versions

Transformers 4.48.0.dev0
Pytorch 2.5.1+cu121
Datasets 3.6.0
Tokenizers 0.21.0

Citation

Please cite the model using the following BibTeX entry:

@misc{deepdml/whisper-medium-ar-mix-norm,
      title={Fine-tuned Whisper medium ASR model for speech recognition in Arabic},
      author={Jimenez, David},
      howpublished={\url{https://huggingface.co/deepdml/whisper-medium-ar-mix-norm}},
      year={2026}
    }

Downloads last month: 278

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for deepdml/whisper-medium-ar-mix-norm

Base model

openai/whisper-medium

Finetuned

(817)

this model

Datasets used to train deepdml/whisper-medium-ar-mix-norm

Evaluation results

Wer on Common Voice 17.0
self-reported

20.468