rvo commited on
Commit
01943a4
·
verified ·
1 Parent(s): c88f220

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -18
README.md CHANGED
@@ -100,15 +100,22 @@ for i, query in enumerate(queries):
100
  print(f"Query: {query}")
101
  for j, doc in enumerate(documents):
102
  print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")
 
 
 
 
 
 
 
 
 
 
103
 
104
- # Query: What is machine learning?
105
- # Similarity: 0.9063 | Document 0: Machine learning is a subset of ...
106
- # Similarity: 0.7287 | Document 1: Neural networks are trained ...
107
- #
108
- # Query: How does neural network training work?
109
- # Similarity: 0.6725 | Document 0: Machine learning is a subset of ...
110
- # Similarity: 0.8287 | Document 1: Neural networks are trained ...
111
  ```
 
112
 
113
  ## Transformers.js
114
 
@@ -123,7 +130,7 @@ You can then use the model to compute embeddings like this:
123
  import { AutoModel, AutoTokenizer, matmul } from "@huggingface/transformers";
124
 
125
  // Download from the 🤗 Hub
126
- const model_id = "onnx-community/mdbr-leaf-mt-ONNX";
127
  const tokenizer = await AutoTokenizer.from_pretrained(model_id);
128
  const model = await AutoModel.from_pretrained(model_id, {
129
  dtype: "fp32", // Options: "fp32" | "q8" | "q4"
@@ -216,13 +223,20 @@ similarities = model.similarity(query_embeds, doc_embeds)
216
  print('After MRL:')
217
  print(f"* Embeddings dimension: {query_embeds.shape[1]}")
218
  print(f"* Similarities: \n\t{similarities}")
 
 
 
 
 
219
 
220
- # After MRL:
221
- # * Embeddings dimension: 256
222
- # * Similarities:
223
- # tensor([[0.9164, 0.7219],
224
- # [0.6682, 0.8393]], device='cuda:0')
225
  ```
 
 
 
 
 
 
 
226
 
227
  ## Vector Quantization
228
  Vector quantization, for example to `int8` or `binary`, can be performed as follows:
@@ -247,13 +261,20 @@ similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T
247
  print('After quantization:')
248
  print(f"* Embeddings type: {query_embeds.dtype}")
249
  print(f"* Similarities: \n{similarities}")
 
250
 
251
- # After quantization:
252
- # * Embeddings type: int8
253
- # * Similarities:
254
- # [[2202032 1422868]
255
- # [1421197 1845580]]
 
 
 
 
 
256
  ```
 
257
 
258
  # Evaluation
259
 
 
100
  print(f"Query: {query}")
101
  for j, doc in enumerate(documents):
102
  print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")
103
+ ```
104
+
105
+ <details>
106
+
107
+ <summary>See example output</summary>
108
+
109
+ ```
110
+ Query: What is machine learning?
111
+ Similarity: 0.9063 | Document 0: Machine learning is a subset of ...
112
+ Similarity: 0.7287 | Document 1: Neural networks are trained ...
113
 
114
+ Query: How does neural network training work?
115
+ Similarity: 0.6725 | Document 0: Machine learning is a subset of ...
116
+ Similarity: 0.8287 | Document 1: Neural networks are trained ...
 
 
 
 
117
  ```
118
+ </details>
119
 
120
  ## Transformers.js
121
 
 
130
  import { AutoModel, AutoTokenizer, matmul } from "@huggingface/transformers";
131
 
132
  // Download from the 🤗 Hub
133
+ const model_id = "MongoDB/mdbr-leaf-mt";
134
  const tokenizer = await AutoTokenizer.from_pretrained(model_id);
135
  const model = await AutoModel.from_pretrained(model_id, {
136
  dtype: "fp32", // Options: "fp32" | "q8" | "q4"
 
223
  print('After MRL:')
224
  print(f"* Embeddings dimension: {query_embeds.shape[1]}")
225
  print(f"* Similarities: \n\t{similarities}")
226
+ ```
227
+
228
+ <details>
229
+
230
+ <summary>See example output</summary>
231
 
 
 
 
 
 
232
  ```
233
+ After MRL:
234
+ * Embeddings dimension: 256
235
+ * Similarities:
236
+ tensor([[0.9164, 0.7219],
237
+ [0.6682, 0.8393]], device='cuda:0')
238
+ ```
239
+ </details>
240
 
241
  ## Vector Quantization
242
  Vector quantization, for example to `int8` or `binary`, can be performed as follows:
 
261
  print('After quantization:')
262
  print(f"* Embeddings type: {query_embeds.dtype}")
263
  print(f"* Similarities: \n{similarities}")
264
+ ```
265
 
266
+ <details>
267
+
268
+ <summary>See example output</summary>
269
+
270
+ ```
271
+ After quantization:
272
+ * Embeddings type: int8
273
+ * Similarities:
274
+ [[2202032 1422868]
275
+ [1421197 1845580]]
276
  ```
277
+ </details>
278
 
279
  # Evaluation
280