EmbeddedLLM
/

all-MiniLM-L6-v2-onnx-o3-cpu

Feature Extraction

sentence-transformers

sentence-similarity

text-embeddings-inference

Model card Files Files and versions

Jia Huei Tan commited on Feb 16, 2024

Commit

a8fed56

·

1 Parent(s): b1997b9

Update README

Files changed (1) hide show

README.md +43 -0

README.md CHANGED Viewed

@@ -1,3 +1,46 @@
 ---
 license: apache-2.0
 ---

 ---
+pipeline_tag: sentence-similarity
+tags:
+  - sentence-transformers
+  - feature-extraction
+  - sentence-similarity
+language: en
 license: apache-2.0
 ---
+# ONNX Conversion of [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
+This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
+## Usage
+```python
+import torch
+import torch.nn.functional as F
+from optimum.onnxruntime import ORTModelForFeatureExtraction
+from transformers import AutoTokenizer
+device = "cuda"
+sentences = [
+    "The llama (/ˈlɑːmə/) (Lama glama) is a domesticated South American camelid.",
+    "The alpaca (Lama pacos) is a species of South American camelid mammal.",
+    "The vicuña (Lama vicugna) (/vɪˈkuːnjə/) is one of the two wild South American camelids.",
+]
+model_name = "EmbeddedLLM/all-MiniLM-L6-v2-onnx-o3-cpu"
+device = "cpu"
+provider = "CPUExecutionProvider"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = ORTModelForFeatureExtraction.from_pretrained(
+    model_name, use_io_binding=True, provider=provider, device_map=device
+)
+inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
+inputs = inputs.to(device)
+token_embeddings = model(**inputs).last_hidden_state
+# Pool
+att_mask = inputs["attention_mask"].unsqueeze(-1).expand(token_embeddings.size()).float()
+embeddings = torch.sum(token_embeddings * att_mask, 1) / torch.clamp(att_mask.sum(1), min=1e-9)
+embeddings = F.normalize(embeddings, p=2, dim=1)
+print(embeddings.cpu().numpy().shape)
+```