SentenceTransformer based on intfloat/multilingual-e5-small

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("codersan/validadted_e5smallStudent")
# Run inference
sentences = [
    'داشتن هزاران دنبال کننده در Quora چگونه است؟',
    'چه چیزی است که ده ها هزار دنبال کننده در Quora داشته باشید؟',
    'چگونه Airprint HP OfficeJet 4620 با HP LaserJet Enterprise M606X مقایسه می شود؟',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 172,826 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 16.19 tokens
    • max: 84 tokens
    • min: 6 tokens
    • mean: 16.5 tokens
    • max: 52 tokens
    • min: 0.73
    • mean: 0.94
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    تفاوت بین تحلیلگر تحقیقات بازار و تحلیلگر تجارت چیست؟ تفاوت بین تحقیقات بازاریابی و تحلیلگر تجارت چیست؟ 0.9806554317474365
    خوردن چه چیزی باعث دل درد میشود؟ چه چیزی باعث رفع دل درد میشود؟ 0.9417070150375366
    بهترین نرم افزار ویرایش ویدیویی کدام است؟ بهترین نرم افزار برای ویرایش ویدیو چیست؟ 0.9928616285324097
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 12
  • learning_rate: 5e-06
  • weight_decay: 0.01
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • push_to_hub: True
  • hub_model_id: codersan/validadted_e5smallStudent
  • eval_on_start: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-06
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: codersan/validadted_e5smallStudent
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: True
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0 0 -
0.0069 100 0.0004
0.0139 200 0.0004
0.0208 300 0.0003
0.0278 400 0.0003
0.0347 500 0.0003
0.0417 600 0.0003
0.0486 700 0.0003
0.0555 800 0.0003
0.0625 900 0.0003
0.0694 1000 0.0003
0.0764 1100 0.0002
0.0833 1200 0.0002
0.0903 1300 0.0002
0.0972 1400 0.0002
0.1041 1500 0.0002
0.1111 1600 0.0002
0.1180 1700 0.0002
0.1250 1800 0.0002
0.1319 1900 0.0002
0.1389 2000 0.0002
0.1458 2100 0.0002
0.1527 2200 0.0002
0.1597 2300 0.0002
0.1666 2400 0.0002
0.1736 2500 0.0002
0.1805 2600 0.0002
0.1875 2700 0.0002
0.1944 2800 0.0002
0.2013 2900 0.0002
0.2083 3000 0.0002
0.2152 3100 0.0002
0.2222 3200 0.0002
0.2291 3300 0.0002
0.2361 3400 0.0002
0.2430 3500 0.0002
0.2499 3600 0.0002
0.2569 3700 0.0002
0.2638 3800 0.0002
0.2708 3900 0.0002
0.2777 4000 0.0002
0.2847 4100 0.0002
0.2916 4200 0.0002
0.2985 4300 0.0002
0.3055 4400 0.0002
0.3124 4500 0.0002
0.3194 4600 0.0002
0.3263 4700 0.0002
0.3333 4800 0.0002
0.3402 4900 0.0002
0.3471 5000 0.0002
0.3541 5100 0.0002
0.3610 5200 0.0002
0.3680 5300 0.0002
0.3749 5400 0.0002
0.3819 5500 0.0002
0.3888 5600 0.0002
0.3958 5700 0.0002
0.4027 5800 0.0002
0.4096 5900 0.0002
0.4166 6000 0.0002
0.4235 6100 0.0002
0.4305 6200 0.0002
0.4374 6300 0.0002
0.4444 6400 0.0002
0.4513 6500 0.0002
0.4582 6600 0.0002
0.4652 6700 0.0002
0.4721 6800 0.0002
0.4791 6900 0.0002
0.4860 7000 0.0002
0.4930 7100 0.0002
0.4999 7200 0.0002
0.5068 7300 0.0002
0.5138 7400 0.0002
0.5207 7500 0.0002
0.5277 7600 0.0002
0.5346 7700 0.0002
0.5416 7800 0.0002
0.5485 7900 0.0002
0.5554 8000 0.0002
0.5624 8100 0.0002
0.5693 8200 0.0002
0.5763 8300 0.0002
0.5832 8400 0.0002
0.5902 8500 0.0002
0.5971 8600 0.0002
0.6040 8700 0.0002
0.6110 8800 0.0002
0.6179 8900 0.0002
0.6249 9000 0.0002
0.6318 9100 0.0002
0.6388 9200 0.0002
0.6457 9300 0.0002
0.6526 9400 0.0002
0.6596 9500 0.0002
0.6665 9600 0.0002
0.6735 9700 0.0002
0.6804 9800 0.0002
0.6874 9900 0.0002
0.6943 10000 0.0002
0.7012 10100 0.0002
0.7082 10200 0.0002
0.7151 10300 0.0002
0.7221 10400 0.0002
0.7290 10500 0.0002
0.7360 10600 0.0002
0.7429 10700 0.0002
0.7498 10800 0.0002
0.7568 10900 0.0002
0.7637 11000 0.0002
0.7707 11100 0.0002
0.7776 11200 0.0002
0.7846 11300 0.0002
0.7915 11400 0.0002
0.7984 11500 0.0002
0.8054 11600 0.0002
0.8123 11700 0.0002
0.8193 11800 0.0002
0.8262 11900 0.0002
0.8332 12000 0.0002
0.8401 12100 0.0002
0.8470 12200 0.0002
0.8540 12300 0.0002
0.8609 12400 0.0002
0.8679 12500 0.0002
0.8748 12600 0.0002
0.8818 12700 0.0002
0.8887 12800 0.0002
0.8956 12900 0.0002
0.9026 13000 0.0002
0.9095 13100 0.0002
0.9165 13200 0.0002
0.9234 13300 0.0002
0.9304 13400 0.0002
0.9373 13500 0.0002
0.9442 13600 0.0002
0.9512 13700 0.0002
0.9581 13800 0.0002
0.9651 13900 0.0002
0.9720 14000 0.0002
0.9790 14100 0.0002
0.9859 14200 0.0002
0.9928 14300 0.0002
0.9998 14400 0.0002

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
3
Safetensors
Model size
118M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for codersan/validadted_e5smallStudent

Finetuned
(117)
this model