onnx-community
/

distiluse-base-multilingual-v2-merged-onnx

Feature Extraction

sentence-transformers

sentence-embeddings

sentence-similarity

semantic-search

retrieval-augmented-generation

tokenizer-embedded

tokenizer-integrated

int8-quantization

mobile-inference

production-ready

deployment-ready

Model card Files Files and versions

distiluse-base-multilingual-v2-merged-onnx / README.md

vlad-m-dev's picture

Upload 2 files

469261b verified 20 days ago

|

history blame contribute delete

2.96 kB

	---
	license: mit
	base_model:
	- Xenova/distiluse-base-multilingual-cased-v2
	pipeline_tag: feature-extraction
	tags:
	- feature-extraction
	- sentence-embeddings
	- sentence-transformers
	- sentence-similarity
	- semantic-search
	- vector-search
	- retrieval-augmented-generation
	- multilingual
	- cross-lingual
	- low-resource
	- merged-model
	- combined-model
	- tokenizer-embedded
	- tokenizer-integrated
	- standalone
	- all-in-one
	- quantized
	- int8
	- int8-quantization
	- optimized
	- efficient
	- fast-inference
	- low-latency
	- lightweight
	- small-model
	- edge-ready
	- arm64
	- edge-device
	- mobile-device
	- on-device
	- mobile-inference
	- tablet
	- smartphone
	- embedded-ai
	- onnx
	- onnx-runtime
	- onnx-model
	- transformers
	- MiniLM
	- MiniLM-L12-v2
	- paraphrase
	- usecase-ready
	- plug-and-play
	- production-ready
	- deployment-ready
	- real-time
	- fasttext
	- distiluse

	---

	# 🧠 Unified Multilingual Distiluse Text Embedder (ONNX + Tokenizer Merged)

	This is a highly optimized, quantized, and fully standalone model for generating sentence embeddings from multilingual text, including Ukrainian, English, Polish, and more.

	Built upon `distiluse-base-multilingual-cased-v2`, the model has been:

	- 🔁 Merged with its tokenizer into a single ONNX file
	- ⚙️ Extended with a custom preprocessing layer
	- ⚡ Quantized to INT8 and ARM64-ready
	- 🧪 Extensively tested across real-world NLP tasks
	- 🛠️ Bug-fixed vs the original `sentence-transformers` quantized version that produced inaccurate cosine similarity

	---

	## 🚀 Key Features

	- 🧩 Single-file architecture: no need for external tokenizer, vocab, or `transformers` library.
	- ⚡ 93% faster inference on mobile compared to the original model.
	- 🗣️ Multilingual: robust across many languages, including low-resource ones.
	- 🧠 Output = pure embeddings: pass a string, get a 768-dim vector. That’s it.
	- 🛠️ Ready for production: small, fast, accurate, and easy to integrate.
	- 📱 Ideal for edge-AI, mobile, and offline scenarios.

	---

	🤖 Author
	@vlad-m-dev Built for edge-ai/phone/tablet offline
	Telegram: https://t.me/dwight_schrute_engineer

	---

	## 🐍 Python Example
	```python
	import numpy as np
	import onnxruntime as ort
	from onnxruntime_extensions import get_library_path

	sess_options = ort.SessionOptions()
	sess_options.register_custom_ops_library(get_library_path())

	session = ort.InferenceSession(
	'model.onnx',
	sess_options=sess_options,
	providers=['CPUExecutionProvider']
	)

	input_feed = {"text": np.asarray(['something..'])}
	outputs = session.run(None, input_feed)
	embedding = outputs[0]
	```

	---

	## 🐍 JS Example
	```JavaScript
	const session = await InferenceSession.create(EMBEDDING_FULL_MODEL_PATH);
	const inputTensor = new Tensor('string', ['something..'], [1]);
	const feeds = { text: inputTensor };
	const outputMap = await session.run(feeds);
	const embedding = outputMap.text_embedding.data;