Feature Extraction
sentence-transformers
ONNX
Transformers
fastText
sentence-embeddings
sentence-similarity
semantic-search
vector-search
retrieval-augmented-generation
multilingual
cross-lingual
low-resource
merged-model
combined-model
tokenizer-embedded
tokenizer-integrated
standalone
all-in-one
quantized
int8
int8-quantization
optimized
efficient
fast-inference
low-latency
lightweight
small-model
edge-ready
arm64
edge-device
mobile-device
on-device
mobile-inference
tablet
smartphone
embedded-ai
onnx-runtime
onnx-model
MiniLM
MiniLM-L12-v2
paraphrase
usecase-ready
plug-and-play
production-ready
deployment-ready
real-time
distiluse
license: mit | |
base_model: | |
- Xenova/distiluse-base-multilingual-cased-v2 | |
pipeline_tag: feature-extraction | |
tags: | |
- feature-extraction | |
- sentence-embeddings | |
- sentence-transformers | |
- sentence-similarity | |
- semantic-search | |
- vector-search | |
- retrieval-augmented-generation | |
- multilingual | |
- cross-lingual | |
- low-resource | |
- merged-model | |
- combined-model | |
- tokenizer-embedded | |
- tokenizer-integrated | |
- standalone | |
- all-in-one | |
- quantized | |
- int8 | |
- int8-quantization | |
- optimized | |
- efficient | |
- fast-inference | |
- low-latency | |
- lightweight | |
- small-model | |
- edge-ready | |
- arm64 | |
- edge-device | |
- mobile-device | |
- on-device | |
- mobile-inference | |
- tablet | |
- smartphone | |
- embedded-ai | |
- onnx | |
- onnx-runtime | |
- onnx-model | |
- transformers | |
- MiniLM | |
- MiniLM-L12-v2 | |
- paraphrase | |
- usecase-ready | |
- plug-and-play | |
- production-ready | |
- deployment-ready | |
- real-time | |
- fasttext | |
- distiluse | |
# π§ Unified Multilingual Distiluse Text Embedder (ONNX + Tokenizer Merged) | |
This is a highly optimized, quantized, and fully standalone model for **generating sentence embeddings** from **multilingual text**, including Ukrainian, English, Polish, and more. | |
Built upon `distiluse-base-multilingual-cased-v2`, the model has been: | |
- π **Merged with its tokenizer** into a single ONNX file | |
- βοΈ **Extended with a custom preprocessing layer** | |
- β‘ **Quantized to INT8** and ARM64-ready | |
- π§ͺ **Extensively tested across real-world NLP tasks** | |
- π οΈ **Bug-fixed** vs the original `sentence-transformers` quantized version that produced inaccurate cosine similarity | |
--- | |
## π Key Features | |
- π§© **Single-file architecture**: no need for external tokenizer, vocab, or `transformers` library. | |
- β‘ **93% faster inference** on mobile compared to the original model. | |
- π£οΈ **Multilingual**: robust across many languages, including low-resource ones. | |
- π§ **Output = pure embeddings**: pass a string, get a 768-dim vector. Thatβs it. | |
- π οΈ **Ready for production**: small, fast, accurate, and easy to integrate. | |
- π± **Ideal for edge-AI, mobile, and offline scenarios.** | |
--- | |
π€ Author | |
@vlad-m-dev Built for edge-ai/phone/tablet offline | |
Telegram: https://t.me/dwight_schrute_engineer | |
--- | |
## π Python Example | |
```python | |
import numpy as np | |
import onnxruntime as ort | |
from onnxruntime_extensions import get_library_path | |
sess_options = ort.SessionOptions() | |
sess_options.register_custom_ops_library(get_library_path()) | |
session = ort.InferenceSession( | |
'model.onnx', | |
sess_options=sess_options, | |
providers=['CPUExecutionProvider'] | |
) | |
input_feed = {"text": np.asarray(['something..'])} | |
outputs = session.run(None, input_feed) | |
embedding = outputs[0] | |
``` | |
--- | |
## π JS Example | |
```JavaScript | |
const session = await InferenceSession.create(EMBEDDING_FULL_MODEL_PATH); | |
const inputTensor = new Tensor('string', ['something..'], [1]); | |
const feeds = { text: inputTensor }; | |
const outputMap = await session.run(feeds); | |
const embedding = outputMap.text_embedding.data; |