SentenceTransformer based on sentence-transformers/LaBSE

This is a sentence-transformers model finetuned from sentence-transformers/LaBSE. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/LaBSE
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("codersan/validadted_falabse_onV9f")
# Run inference
sentences = [
    'برای تبدیل شدن به نویسنده برتر Quora ، چند بازدید و پاسخ لازم است؟',
    'چگونه می توانم نویسنده برتر Quora شوم ، از صعود بیشتر و آمار بهتر استفاده کنم؟',
    'من به دنبال خرید دوچرخه جدید هستم.Suzuki Gixxer 155 یا Honda Hornet 160r.کدام یک را بخرید؟',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 131,157 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 15.78 tokens
    • max: 86 tokens
    • min: 5 tokens
    • mean: 15.52 tokens
    • max: 57 tokens
  • Samples:
    anchor positive
    وقتی سوال من به عنوان "این سوال ممکن است به ویرایش نیاز داشته باشد" چه کاری باید انجام دهم ، اما نمی توانم دلیل آن را پیدا کنم؟ چرا سوال من به عنوان نیاز به پیشرفت مشخص شده است؟
    چگونه می توانید یک فایل رمزگذاری شده را با دانستن اینکه این یک فایل تصویری است بدون دانستن گسترش پرونده یا کلید ، رمزگشایی کنید؟ چگونه می توانید یک فایل رمزگذاری شده را رمزگشایی کنید و بدانید که این یک فایل تصویری است بدون اینکه از پسوند پرونده اطلاع داشته باشید؟
    احساس می کنم خودکشی می کنم ، چگونه باید با آن برخورد کنم؟ احساس می کنم خودکشی می کنم.چه کاری باید انجام دهم؟
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 12
  • learning_rate: 5e-06
  • weight_decay: 0.01
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • push_to_hub: True
  • hub_model_id: codersan/validadted_falabse_onV9f
  • eval_on_start: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-06
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: codersan/validadted_falabse_onV9f
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: True
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0 0 -
0.0091 100 0.1214
0.0183 200 0.0776
0.0274 300 0.0555
0.0366 400 0.0507
0.0457 500 0.0423
0.0549 600 0.0328
0.0640 700 0.0391
0.0732 800 0.0164
0.0823 900 0.0155
0.0915 1000 0.0138
0.1006 1100 0.0219
0.1098 1200 0.0267
0.1189 1300 0.0251
0.1281 1400 0.033
0.1372 1500 0.0151
0.1464 1600 0.0129
0.1555 1700 0.023
0.1647 1800 0.026
0.1738 1900 0.0264
0.1830 2000 0.0105
0.1921 2100 0.0262
0.2013 2200 0.0118
0.2104 2300 0.0223
0.2196 2400 0.043
0.2287 2500 0.0187
0.2379 2600 0.0135
0.2470 2700 0.0165
0.2562 2800 0.0191
0.2653 2900 0.0247
0.2745 3000 0.0207
0.2836 3100 0.0213
0.2928 3200 0.0193
0.3019 3300 0.0137
0.3111 3400 0.0208
0.3202 3500 0.0228
0.3294 3600 0.0213
0.3385 3700 0.0184
0.3477 3800 0.016
0.3568 3900 0.0131
0.3660 4000 0.0133
0.3751 4100 0.0117
0.3843 4200 0.0201
0.3934 4300 0.0121
0.4026 4400 0.0309
0.4117 4500 0.0177
0.4209 4600 0.02
0.4300 4700 0.035
0.4392 4800 0.0167
0.4483 4900 0.0108
0.4575 5000 0.016
0.4666 5100 0.0158
0.4758 5200 0.0102
0.4849 5300 0.0167
0.4941 5400 0.0252
0.5032 5500 0.015
0.5124 5600 0.0321
0.5215 5700 0.0144
0.5306 5800 0.0228
0.5398 5900 0.0222
0.5489 6000 0.0234
0.5581 6100 0.0111
0.5672 6200 0.0265
0.5764 6300 0.0224
0.5855 6400 0.0237
0.5947 6500 0.0289
0.6038 6600 0.016
0.6130 6700 0.01
0.6221 6800 0.0129
0.6313 6900 0.0201
0.6404 7000 0.01
0.6496 7100 0.0126
0.6587 7200 0.0194
0.6679 7300 0.0204
0.6770 7400 0.0203
0.6862 7500 0.0141
0.6953 7600 0.015
0.7045 7700 0.0221
0.7136 7800 0.0155
0.7228 7900 0.0142
0.7319 8000 0.0112
0.7411 8100 0.0142
0.7502 8200 0.0141
0.7594 8300 0.0136
0.7685 8400 0.0328
0.7777 8500 0.0103
0.7868 8600 0.0156
0.7960 8700 0.0208
0.8051 8800 0.0262
0.8143 8900 0.0234
0.8234 9000 0.0128
0.8326 9100 0.0125
0.8417 9200 0.0309
0.8509 9300 0.012
0.8600 9400 0.0127
0.8692 9500 0.0119
0.8783 9600 0.0297
0.8875 9700 0.0208
0.8966 9800 0.0178
0.9058 9900 0.0216
0.9149 10000 0.0272
0.9241 10100 0.021
0.9332 10200 0.019
0.9424 10300 0.0104
0.9515 10400 0.0229
0.9607 10500 0.0161
0.9698 10600 0.0161
0.9790 10700 0.0243
0.9881 10800 0.0263
0.9973 10900 0.0112

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
6
Safetensors
Model size
471M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for codersan/FaLabse

Finetuned
(64)
this model

Collection including codersan/FaLabse