File size: 16,942 Bytes
7935f23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:2000
- loss:CoSENTLoss
base_model: avsolatorio/GIST-small-Embedding-v0
widget:
- source_sentence: is alexa compatible with tv?
  sentences:
  - Of een ei iedere dag gezond of ongezond is, hangt af van wat je verder iedere
    dag eet. Het Voedingscentrum adviseert om te variëren in vis, peulvruchten, vlees
    en ei. Het eten van 2-3 eieren per week past in een gezonde voeding. Vegetariërs
    kunnen 3-4 eieren per week eten.
  - The price was right, the size was right and as it turns out this PYLE TV has the
    best picture quality of all 5 TVs that our family watches! The setup was super
    easy with no hassle. I would recommend it to anyone!
  - According to the Association of British Insurers, insurance companies will look
    into a policyholder's medical profile if they give up smoking. They'll commonly
    seek a report from a policyholder's family doctor. If this raises concerns, they
    may ask a policyholder to have a chest X-ray.
- source_sentence: is nyada a real college?
  sentences:
  - The instruments have been classified as Wind instruments (aero phonic) including
    Bansuri and Nagaswaram; String instruments (chordophonic) including Dilruba and
    Veena; Percussion instruments (membranophonic) including Tabla, Mridangam and
    (idiophonic) Bortal, and Ghatam.
  - This service is currently offered free of charge by the bank. You can get the
    last 'Available' balance of your account (by an SMS) by giving a Missed Call to
    18008431122. You can get the Mini Statement (by an SMS) for last 5 transactions
    in your account by giving a Missed Call to 18008431133. 1.
  - King Size Bed Known as a standard 5ft bed or 150cm wide by 200cm in length.
- source_sentence: is europe bigger than australia?
  sentences:
  - Although this is just five per cent of the world's land mass (149.45 million square
    kilometres), Australia is the planet's sixth largest country after Russia, Canada,
    China, the United States of America and Brazil. ... almost as great as that of
    the United States of America. about 50 per cent greater than Europe, and.
  - The recommended dose of evening primrose oil is 8 to 12 capsules a day, at a dose
    of 500 milligrams per capsule. A range of evening primrose oil products are available
    for purchase online.
  - This includes a three-year law degree, a one-year LPC and finally a two-year training
    contract with a law firm. Studying a non-law subject for your degree means you'll
    need to take the GDL conversion course before your LPC, which adds one year to
    the total.
- source_sentence: how long does money take to transfer boi?
  sentences:
  - 'When will it take more than one working day? It will take more than one working
    day to reach your payee''s bank when: You make a payment online after 3.30pm in
    the Republic of Ireland or after 4.30pm in Northern Ireland and Great Britain
    on a working day. Your payment will begin to process on the next working day.'
  - U.S. citizens travelling to South Korea for business or tourism do not need a
    visa. ... Although obtaining a visa in advance can ease the entry process, as
    long as you have a valid U.S. passport, you can enter the Republic of Korea without
    a visa for a stay of up to 90 days if you are a tourist or on business.
  - Structural insulated panels (SIPs) are a high performance building system for
    residential and commercial construction. The panels consist of an insulating foam
    core sandwiched between two structural facings, typically oriented strand board
    (OSB). SIPs are manufactured under factory controlled conditions.
- source_sentence: where are bussola shoes made?
  sentences:
  - According to Harvard University, biking at a moderate speed of 12 to 13.9 miles
    per hour will cause a 155-pound person to burn 298 calories in 30 minutes. At
    a faster rate of 14 to 15.9 miles per hour, a person of the same weight will burn
    372 calories.
  - If you had bought just one share of Microsoft at the IPO, you would now have 288
    shares after all the splits. Those shares would be worth $44,505 at the current
    stock quote of $154.53. A $5,000 investment would have purchased 238 shares at
    the IPO price.
  - FRAM opens the first plant devoted exclusively to the development and manufacture
    of heavy duty air filters and cartridges, in Nevada, Missouri.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---

# SentenceTransformer based on avsolatorio/GIST-small-Embedding-v0

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [avsolatorio/GIST-small-Embedding-v0](https://huggingface.co/avsolatorio/GIST-small-Embedding-v0). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [avsolatorio/GIST-small-Embedding-v0](https://huggingface.co/avsolatorio/GIST-small-Embedding-v0) <!-- at revision 75e62fd210b9fde790430e0b2f040b0b00a021b1 -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("moshew/gist_small_ft_gooaq_v2")
# Run inference
sentences = [
    'where are bussola shoes made?',
    'FRAM opens the first plant devoted exclusively to the development and manufacture of heavy duty air filters and cartridges, in Nevada, Missouri.',
    'According to Harvard University, biking at a moderate speed of 12 to 13.9 miles per hour will cause a 155-pound person to burn 298 calories in 30 minutes. At a faster rate of 14 to 15.9 miles per hour, a person of the same weight will burn 372 calories.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset

* Size: 2,000 training samples
* Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
* Approximate statistics based on the first 1000 samples:
  |         | sentence1                                                                         | sentence2                                                                           | label                                                         |
  |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:--------------------------------------------------------------|
  | type    | string                                                                            | string                                                                              | float                                                         |
  | details | <ul><li>min: 8 tokens</li><li>mean: 12.05 tokens</li><li>max: 23 tokens</li></ul> | <ul><li>min: 13 tokens</li><li>mean: 59.28 tokens</li><li>max: 118 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.5</li><li>max: 1.0</li></ul> |
* Samples:
  | sentence1                                                                             | sentence2                                                                                                                                                                                                                                                                                                                            | label            |
  |:--------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
  | <code>what is the difference between rapid rise yeast and bread machine yeast?</code> | <code>Though there are some minor differences in shape and nutrients, Rapid-Rise Yeast is (pretty much) the same as Instant Yeast and Bread Machine Yeast. ... Also, Rapid-Rise Yeast is a little more potent than Active Dry Yeast and can be mixed in with your dry ingredients directly.</code>                                   | <code>1.0</code> |
  | <code>what is the difference between rapid rise yeast and bread machine yeast?</code> | <code>Application. To clarify, double-acting baking powder is “regular” baking powder. Single-acting baking powder exits, but when a recipe calls for baking powder it means double-acting. And even if a recipe does call for single-acting, you can substitute double-acting without worrying about it changing the recipe.</code> | <code>0.0</code> |
  | <code>are light kits universal for ceiling fans?</code>                               | <code>Not all Universal Light Kits are actually Universal. They can be universal to only that manufacturer. ... Casablanca and Hunter Ceiling Fan Light Kits are universal only to their own fans.</code>                                                                                                                            | <code>1.0</code> |
* Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "pairwise_cos_sim"
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `seed`: 12
- `bf16`: True
- `dataloader_num_workers`: 4

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 1
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 12
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 4
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `tp_size`: 0
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch | Step | Training Loss |
|:-----:|:----:|:-------------:|
| 0.008 | 1    | 1.9382        |


### Framework Versions
- Python: 3.11.12
- Sentence Transformers: 4.1.0
- Transformers: 4.51.3
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### CoSENTLoss
```bibtex
@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->