alvarobartt HF Staff commited on
Commit
a62d6ce
·
verified ·
1 Parent(s): 1c1d773

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -39
README.md CHANGED
@@ -116,6 +116,40 @@ for doc, score in doc_score_pairs:
116
  print(score, doc)
117
  ```
118
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
119
  ## Technical Details
120
 
121
  In the following some technical details how this model must be used:
@@ -129,7 +163,6 @@ In the following some technical details how this model must be used:
129
 
130
  ----
131
 
132
-
133
  ## Background
134
 
135
  The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised
@@ -146,8 +179,6 @@ Our model is intended to be used for semantic search: It encodes queries / quest
146
 
147
  Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text.
148
 
149
-
150
-
151
  ## Training procedure
152
 
153
  The full training script is accessible in this current repository: `train_script.py`.
@@ -163,9 +194,6 @@ We sampled each dataset given a weighted probability which configuration is deta
163
 
164
  The model was trained with [MultipleNegativesRankingLoss](https://www.sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) using CLS-pooling, dot-product as similarity function, and a scale of 1.
165
 
166
-
167
-
168
-
169
  | Dataset | Number of training tuples |
170
  |--------------------------------------------------------|:--------------------------:|
171
  | [WikiAnswers](https://github.com/afader/oqa#wikianswers-corpus) Duplicate question pairs from WikiAnswers | 77,427,422 |
@@ -185,36 +213,4 @@ The model was trained with [MultipleNegativesRankingLoss](https://www.sbert.net/
185
  | [Natural Questions (NQ)](https://ai.google.com/research/NaturalQuestions) (Question, Paragraph) pairs for 100k real Google queries with relevant Wikipedia paragraph | 100,231 |
186
  | [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/) (Question, Paragraph) pairs from SQuAD2.0 dataset | 87,599 |
187
  | [TriviaQA](https://huggingface.co/datasets/trivia_qa) (Question, Evidence) pairs | 73,346 |
188
- | **Total** | **214,988,242** |
189
-
190
- ## Usage (Text Embeddings Inference (TEI))
191
-
192
- [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference) is a blazing fast inference solution for text embeddings models.
193
-
194
- - CPU:
195
- ```bash
196
- docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest \
197
- --model-id sentence-transformers/multi-qa-mpnet-base-dot-v1 \
198
- --pooling cls \
199
- --dtype float16
200
- ```
201
-
202
- - NVIDIA GPU:
203
- ```bash
204
- docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-latest \
205
- --model-id sentence-transformers/multi-qa-mpnet-base-dot-v1 \
206
- --pooling cls \
207
- --dtype float16
208
- ```
209
-
210
- Send a request to `/v1/embeddings` to generate embeddings via the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings/create):
211
- ```bash
212
- curl http://localhost:8080/v1/embeddings \
213
- -H "Content-Type: application/json" \
214
- -d '{
215
- "model": "sentence-transformers/multi-qa-mpnet-base-dot-v1",
216
- "input": "How many people live in London?"
217
- }'
218
- ```
219
-
220
- Or check the [Text Embeddings Inference API specification](https://huggingface.github.io/text-embeddings-inference/) instead.
 
116
  print(score, doc)
117
  ```
118
 
119
+ ## Usage (Text Embeddings Inference (TEI))
120
+
121
+ [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference) is a blazing fast inference solution for text embeddings models.
122
+
123
+ - CPU:
124
+ ```bash
125
+ docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest \
126
+ --model-id sentence-transformers/multi-qa-mpnet-base-dot-v1 \
127
+ --pooling cls \
128
+ --dtype float16
129
+ ```
130
+
131
+ - NVIDIA GPU:
132
+ ```bash
133
+ docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-latest \
134
+ --model-id sentence-transformers/multi-qa-mpnet-base-dot-v1 \
135
+ --pooling cls \
136
+ --dtype float16
137
+ ```
138
+
139
+ Send a request to `/v1/embeddings` to generate embeddings via the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings/create):
140
+ ```bash
141
+ curl http://localhost:8080/v1/embeddings \
142
+ -H "Content-Type: application/json" \
143
+ -d '{
144
+ "model": "sentence-transformers/multi-qa-mpnet-base-dot-v1",
145
+ "input": "How many people live in London?"
146
+ }'
147
+ ```
148
+
149
+ Or check the [Text Embeddings Inference API specification](https://huggingface.github.io/text-embeddings-inference/) instead.
150
+
151
+ ----
152
+
153
  ## Technical Details
154
 
155
  In the following some technical details how this model must be used:
 
163
 
164
  ----
165
 
 
166
  ## Background
167
 
168
  The project aims to train sentence embedding models on very large sentence level datasets using a self-supervised
 
179
 
180
  Note that there is a limit of 512 word pieces: Text longer than that will be truncated. Further note that the model was just trained on input text up to 250 word pieces. It might not work well for longer text.
181
 
 
 
182
  ## Training procedure
183
 
184
  The full training script is accessible in this current repository: `train_script.py`.
 
194
 
195
  The model was trained with [MultipleNegativesRankingLoss](https://www.sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) using CLS-pooling, dot-product as similarity function, and a scale of 1.
196
 
 
 
 
197
  | Dataset | Number of training tuples |
198
  |--------------------------------------------------------|:--------------------------:|
199
  | [WikiAnswers](https://github.com/afader/oqa#wikianswers-corpus) Duplicate question pairs from WikiAnswers | 77,427,422 |
 
213
  | [Natural Questions (NQ)](https://ai.google.com/research/NaturalQuestions) (Question, Paragraph) pairs for 100k real Google queries with relevant Wikipedia paragraph | 100,231 |
214
  | [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/) (Question, Paragraph) pairs from SQuAD2.0 dataset | 87,599 |
215
  | [TriviaQA](https://huggingface.co/datasets/trivia_qa) (Question, Evidence) pairs | 73,346 |
216
+ | **Total** | **214,988,242** |