SimFonX commited on
Commit
140a297
Β·
verified Β·
1 Parent(s): c6fcff6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -3
README.md CHANGED
@@ -1,3 +1,87 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Whisper ONNX Optimized Models
2
+
3
+ Optimized Whisper ONNX models packaged for easy deployment. Each zip contains all necessary files for inference.
4
+
5
+ ## Models Available
6
+
7
+ | Model | Language | Size | Target Use | Download |
8
+ |-------|----------|------|------------|----------|
9
+ | **Medium English** | English-only | ~486MB | High quality English transcription | [whisper-medium-en-onnx.zip](medium-en/whisper-medium-en-onnx.zip) |
10
+ | **Small English** | English-only | ~85MB | Fast English transcription | [whisper-small-en-onnx.zip](small-en/whisper-small-en-onnx.zip) |
11
+ | **Small Multilingual** | 99 languages | ~110MB | Fast multilingual transcription | [whisper-small-multilingual-onnx.zip](small-multilingual/whisper-small-multilingual-onnx.zip) |
12
+ | **Medium Multilingual** | 99 languages | ~295MB | High quality multilingual | [whisper-medium-multilingual-onnx.zip](medium-multilingual/whisper-medium-multilingual-onnx.zip) |
13
+ | **Large v3 Turbo** | 99 languages | ~530MB | Best quality, fastest large model | [whisper-large-v3-turbo-onnx.zip](large-v3-turbo/whisper-large-v3-turbo-onnx.zip) |
14
+
15
+ ## Size Comparison vs GGML Q5_0
16
+
17
+ All models are **smaller** than equivalent GGML Q5_0 models:
18
+
19
+ - Medium English: 486MB vs 515MB GGML βœ… (-29MB)
20
+ - Small models: ~85-110MB vs 182MB GGML βœ… (-70-97MB)
21
+ - Large v3 Turbo: 530MB vs 574MB GGML βœ… (-44MB)
22
+
23
+ ## Contents of Each Zip
24
+
25
+ Each zip file contains 7 files needed for inference:
26
+
27
+ ### ONNX Model Files
28
+ - `encoder_model_quantized.onnx` - Audio encoder (processes mel spectrograms)
29
+ - `decoder_model_merged_quantized.onnx` - Text decoder (generates transcription)
30
+ - `decoder_with_past_model_quantized.onnx` - Optimized decoder with KV caching
31
+
32
+ ### Configuration Files
33
+ - `config.json` - Model configuration
34
+ - `generation_config.json` - Generation parameters
35
+ - `preprocessor_config.json` - Audio preprocessing settings
36
+ - `tokenizer.json` - Tokenizer vocabulary
37
+
38
+ ## Usage
39
+
40
+ ### C# with ONNX Runtime
41
+ ```csharp
42
+ // Download and extract zip
43
+ var modelPath = "path/to/extracted/model/";
44
+
45
+ // Initialize with DirectML support
46
+ var sessionOptions = new SessionOptions();
47
+ sessionOptions.AppendExecutionProvider_DML(0);
48
+
49
+ var encoderSession = new InferenceSession(
50
+ Path.Combine(modelPath, "encoder_model_quantized.onnx"), sessionOptions);
51
+ var decoderSession = new InferenceSession(
52
+ Path.Combine(modelPath, "decoder_with_past_model_quantized.onnx"), sessionOptions);
53
+ ```
54
+
55
+ ### Python with ONNX Runtime
56
+ ```python
57
+ import onnxruntime as ort
58
+
59
+ # Load with DirectML/CUDA support
60
+ providers = ['DmlExecutionProvider', 'CPUExecutionProvider']
61
+ encoder_session = ort.InferenceSession('encoder_model_quantized.onnx', providers=providers)
62
+ decoder_session = ort.InferenceSession('decoder_with_past_model_quantized.onnx', providers=providers)
63
+ ```
64
+
65
+ ## Features
66
+
67
+ βœ… **DirectML Support** - Works with any DirectX 12 GPU (AMD, Intel, NVIDIA)
68
+ βœ… **CUDA Support** - Accelerated inference on NVIDIA GPUs
69
+ βœ… **CPU Fallback** - Automatic fallback to CPU if GPU unavailable
70
+ βœ… **Quantized** - INT8/INT4 quantization for smaller size and faster inference
71
+ βœ… **Complete** - All files needed for inference included
72
+
73
+ ## Model Sources
74
+
75
+ These models are repackaged from:
76
+ - [Distil-Whisper](https://huggingface.co/distil-whisper) (English models)
77
+ - [ONNX Community](https://huggingface.co/onnx-community) (Multilingual models)
78
+
79
+ ## License
80
+
81
+ Models inherit their original licenses:
82
+ - Distil-Whisper models: MIT License
83
+ - Whisper models: MIT License
84
+
85
+ ## Version History
86
+
87
+ - **v1.0.0** - Initial release with 5 optimized models