README.md · hackergeek98/tinyyyy_whisper at 0991ffccd6ea26fae6c0a5f55e78ef21ededc643

metadata

license: mit
datasets:
  - mozilla-foundation/common_voice_11_0
language:
  - fa
metrics:
  - wer
base_model:
  - openai/whisper-tiny
pipeline_tag: automatic-speech-recognition
library_name: transformers

this model trained on validation segment of data for one epoch with 0.05 loss and tested on test segment of data with 0.07 loss and WER: 1.636687802644541

how to use the model in colab: # Install required packages !pip install torch torchaudio transformers pydub google-colab

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from pydub import AudioSegment
import os
from google.colab import files

# Load the model and processor
model_id = "hackergeek98/tinyyyy_whisper"
device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
processor = AutoProcessor.from_pretrained(model_id)

# Create pipeline
whisper_pipe = pipeline(
    "automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, device=0 if torch.cuda.is_available() else -1
)

# Convert audio to WAV format
def convert_to_wav(audio_path):
    audio = AudioSegment.from_file(audio_path)
    wav_path = "converted_audio.wav"
    audio.export(wav_path, format="wav")
    return wav_path

# Transcribe an audio file and save as text
def transcribe(audio_path):
    wav_path = convert_to_wav(audio_path)
    result = whisper_pipe(wav_path)
    os.remove(wav_path)  # Cleanup temporary file
    
    # Save transcription to a text file
    text_path = "transcription.txt"
    with open(text_path, "w") as f:
        f.write(result["text"])
    
    return text_path

# Upload and process audio in Colab
uploaded = files.upload()
audio_file = list(uploaded.keys())[0]
transcription_file = transcribe(audio_file)

# Download the transcription file
files.download(transcription_file)