vlad-m-dev commited on
Commit
469261b
Β·
verified Β·
1 Parent(s): c728323

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +117 -3
  2. combined_tokenizer_embedded_model.onnx +3 -0
README.md CHANGED
@@ -1,3 +1,117 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - Xenova/distiluse-base-multilingual-cased-v2
5
+ pipeline_tag: feature-extraction
6
+ tags:
7
+ - feature-extraction
8
+ - sentence-embeddings
9
+ - sentence-transformers
10
+ - sentence-similarity
11
+ - semantic-search
12
+ - vector-search
13
+ - retrieval-augmented-generation
14
+ - multilingual
15
+ - cross-lingual
16
+ - low-resource
17
+ - merged-model
18
+ - combined-model
19
+ - tokenizer-embedded
20
+ - tokenizer-integrated
21
+ - standalone
22
+ - all-in-one
23
+ - quantized
24
+ - int8
25
+ - int8-quantization
26
+ - optimized
27
+ - efficient
28
+ - fast-inference
29
+ - low-latency
30
+ - lightweight
31
+ - small-model
32
+ - edge-ready
33
+ - arm64
34
+ - edge-device
35
+ - mobile-device
36
+ - on-device
37
+ - mobile-inference
38
+ - tablet
39
+ - smartphone
40
+ - embedded-ai
41
+ - onnx
42
+ - onnx-runtime
43
+ - onnx-model
44
+ - transformers
45
+ - MiniLM
46
+ - MiniLM-L12-v2
47
+ - paraphrase
48
+ - usecase-ready
49
+ - plug-and-play
50
+ - production-ready
51
+ - deployment-ready
52
+ - real-time
53
+ - fasttext
54
+ - distiluse
55
+
56
+ ---
57
+
58
+ # 🧠 Unified Multilingual Distiluse Text Embedder (ONNX + Tokenizer Merged)
59
+
60
+ This is a highly optimized, quantized, and fully standalone model for **generating sentence embeddings** from **multilingual text**, including Ukrainian, English, Polish, and more.
61
+
62
+ Built upon `distiluse-base-multilingual-cased-v2`, the model has been:
63
+
64
+ - πŸ” **Merged with its tokenizer** into a single ONNX file
65
+ - βš™οΈ **Extended with a custom preprocessing layer**
66
+ - ⚑ **Quantized to INT8** and ARM64-ready
67
+ - πŸ§ͺ **Extensively tested across real-world NLP tasks**
68
+ - πŸ› οΈ **Bug-fixed** vs the original `sentence-transformers` quantized version that produced inaccurate cosine similarity
69
+
70
+ ---
71
+
72
+ ## πŸš€ Key Features
73
+
74
+ - 🧩 **Single-file architecture**: no need for external tokenizer, vocab, or `transformers` library.
75
+ - ⚑ **93% faster inference** on mobile compared to the original model.
76
+ - πŸ—£οΈ **Multilingual**: robust across many languages, including low-resource ones.
77
+ - 🧠 **Output = pure embeddings**: pass a string, get a 768-dim vector. That’s it.
78
+ - πŸ› οΈ **Ready for production**: small, fast, accurate, and easy to integrate.
79
+ - πŸ“± **Ideal for edge-AI, mobile, and offline scenarios.**
80
+
81
+ ---
82
+
83
+ πŸ€– Author
84
+ @vlad-m-dev Built for edge-ai/phone/tablet offline
85
+ Telegram: https://t.me/dwight_schrute_engineer
86
+
87
+ ---
88
+
89
+ ## 🐍 Python Example
90
+ ```python
91
+ import numpy as np
92
+ import onnxruntime as ort
93
+ from onnxruntime_extensions import get_library_path
94
+
95
+ sess_options = ort.SessionOptions()
96
+ sess_options.register_custom_ops_library(get_library_path())
97
+
98
+ session = ort.InferenceSession(
99
+ 'model.onnx',
100
+ sess_options=sess_options,
101
+ providers=['CPUExecutionProvider']
102
+ )
103
+
104
+ input_feed = {"text": np.asarray(['something..'])}
105
+ outputs = session.run(None, input_feed)
106
+ embedding = outputs[0]
107
+ ```
108
+
109
+ ---
110
+
111
+ ## 🐍 JS Example
112
+ ```JavaScript
113
+ const session = await InferenceSession.create(EMBEDDING_FULL_MODEL_PATH);
114
+ const inputTensor = new Tensor('string', ['something..'], [1]);
115
+ const feeds = { text: inputTensor };
116
+ const outputMap = await session.run(feeds);
117
+ const embedding = outputMap.text_embedding.data;
combined_tokenizer_embedded_model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:827d1a23e47f8c68a5788e58a06e137edf52aa568a6e4c852f4ce79f21b8a205
3
+ size 136313389