Whisper-Small-En: ASR

Whisper-Small-En, developed by OpenAI, is a mid-sized English speech recognition model based on the Transformer architecture, scaling up parameters (~307M) beyond Tiny and Base versions to enhance transcription accuracy and contextual comprehension. It enables high-precision real-time speech-to-text conversion, multilingual translation, and voice command analysis, trained on extensive multimodal data to handle accents, background noise, and domain-specific terminology. Ideal for scenarios demanding reliability, such as professional meetings, medical dictation, legal documentation, or live multilingual translation, it balances efficiency and performance on mid-tier GPUs or cloud platforms. Challenges include managing long audio sequences, minimizing real-time latency, and optimizing computational resource allocation.

Source model

  • Input shape: [1x80x3000],[[1x1],[1x1],[12x12x64x1500],[12x12x1500x64],[12x12x64x224],[12x12x224x64]]
  • Number of parameters: 102M, 139M
  • Model size: 390M, 531M
  • Output shape: [[12x12x64x1500],[12x12x1500x64]],[[1x1x51864],[12x12x64x224],[12x12x224x64]]

The source model can be found here

Performance Reference

Please search model by model name in Model Farm

Inference & Model Conversion

Please search model by model name in Model Farm

License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support