onnx-community
/

distiluse-base-multilingual-v2-merged-onnx

+---
+license: mit
+base_model:
+- Xenova/distiluse-base-multilingual-cased-v2
+pipeline_tag: feature-extraction
+tags:
+- feature-extraction
+- sentence-embeddings
+- sentence-transformers
+- sentence-similarity
+- semantic-search
+- vector-search
+- retrieval-augmented-generation
+- multilingual
+- cross-lingual
+- low-resource
+- merged-model
+- combined-model
+- tokenizer-embedded
+- tokenizer-integrated
+- standalone
+- all-in-one
+- quantized
+- int8
+- int8-quantization
+- optimized
+- efficient
+- fast-inference
+- low-latency
+- lightweight
+- small-model
+- edge-ready
+- arm64
+- edge-device
+- mobile-device
+- on-device
+- mobile-inference
+- tablet
+- smartphone
+- embedded-ai
+- onnx
+- onnx-runtime
+- onnx-model
+- transformers
+- MiniLM
+- MiniLM-L12-v2
+- paraphrase
+- usecase-ready
+- plug-and-play
+- production-ready
+- deployment-ready
+- real-time
+- fasttext
+- distiluse
+---
+# 🧠 Unified Multilingual Distiluse Text Embedder (ONNX + Tokenizer Merged)
+This is a highly optimized, quantized, and fully standalone model for **generating sentence embeddings** from **multilingual text**, including Ukrainian, English, Polish, and more.
+Built upon `distiluse-base-multilingual-cased-v2`, the model has been:
+- 🔁 **Merged with its tokenizer** into a single ONNX file
+- ⚙️ **Extended with a custom preprocessing layer**
+- ⚡ **Quantized to INT8** and ARM64-ready
+- 🧪 **Extensively tested across real-world NLP tasks**
+- 🛠️ **Bug-fixed** vs the original `sentence-transformers` quantized version that produced inaccurate cosine similarity
+---
+## 🚀 Key Features
+- 🧩 **Single-file architecture**: no need for external tokenizer, vocab, or `transformers` library.
+- ⚡ **93% faster inference** on mobile compared to the original model.
+- 🗣️ **Multilingual**: robust across many languages, including low-resource ones.
+- 🧠 **Output = pure embeddings**: pass a string, get a 768-dim vector. That’s it.
+- 🛠️ **Ready for production**: small, fast, accurate, and easy to integrate.
+- 📱 **Ideal for edge-AI, mobile, and offline scenarios.**
+---
+🤖 Author
+@vlad-m-dev Built for edge-ai/phone/tablet offline
+Telegram: https://t.me/dwight_schrute_engineer
+---
+## 🐍 Python Example
+```python
+import numpy as np
+import onnxruntime as ort
+from onnxruntime_extensions import get_library_path
+sess_options = ort.SessionOptions()
+sess_options.register_custom_ops_library(get_library_path())
+session = ort.InferenceSession(
+    'model.onnx',
+    sess_options=sess_options,
+    providers=['CPUExecutionProvider']
+)
+input_feed = {"text": np.asarray(['something..'])}
+outputs = session.run(None, input_feed)
+embedding = outputs[0]
+```
+---
+## 🐍 JS Example
+```JavaScript
+const session = await InferenceSession.create(EMBEDDING_FULL_MODEL_PATH);
+const inputTensor = new Tensor('string', ['something..'], [1]);
+const feeds = { text: inputTensor };
+const outputMap = await session.run(feeds);
+const embedding = outputMap.text_embedding.data;

combined_tokenizer_embedded_model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:827d1a23e47f8c68a5788e58a06e137edf52aa568a6e4c852f4ce79f21b8a205
+size 136313389