SentenceTransformer based on hiiamsid/sentence_similarity_spanish_es

This is a sentence-transformers model finetuned from hiiamsid/sentence_similarity_spanish_es. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: hiiamsid/sentence_similarity_spanish_es
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sd-dreambooth-library/mks-similarity")
# Run inference
sentences = [
    '¿Qué modelo corresponde al código YP107?',
    '¿Cuánto cuesta la llave P123VE?',
    '¿La llave TE4 pertenece a qué marca?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 91,044 training samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 10 tokens
    • mean: 17.16 tokens
    • max: 41 tokens
    • min: 9 tokens
    • mean: 17.26 tokens
    • max: 40 tokens
    • min: 0.0
    • mean: 0.51
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    ¿CY1 HELLO KITTY CORAZONES tiene un precio accesible? ¿Cuánto cuesta la llave CY43? 0.0
    ¿Qué modelo corresponde al código OP12? ¿Me puedes decir cuánto vale CHEVROLET GM29? 0.0
    ¿YALE PERSONAJE HULK tiene un precio accesible? ¿Cuánto debo pagar por la llave con código YP117? 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 10,116 evaluation samples
  • Columns: sentence1, sentence2, and label
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 label
    type string string float
    details
    • min: 10 tokens
    • mean: 17.08 tokens
    • max: 42 tokens
    • min: 10 tokens
    • mean: 17.33 tokens
    • max: 42 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    sentence1 sentence2 label
    ¿Cuál es el precio de la AM3 AMERICAN LOCK? ¿Cuánto debo pagar por la llave con código AM3? 1.0
    ¿Cuánto debo pagar por la llave con código MAS9? ¿La llave MAS9 pertenece a qué marca? 1.0
    ¿La llave YP113 pertenece a qué marca? ¿Qué llave tiene el código E029? 0.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 2
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.0351 100 0.1663 -
0.0703 200 0.1023 -
0.1054 300 0.0807 -
0.1405 400 0.0723 -
0.1757 500 0.0614 0.0535
0.2108 600 0.0569 -
0.2460 700 0.052 -
0.2811 800 0.0382 -
0.3162 900 0.0408 -
0.3514 1000 0.0358 0.0329
0.3865 1100 0.0353 -
0.4216 1200 0.032 -
0.4568 1300 0.0303 -
0.4919 1400 0.0275 -
0.5271 1500 0.0263 0.0223
0.5622 1600 0.0237 -
0.5973 1700 0.0215 -
0.6325 1800 0.0233 -
0.6676 1900 0.0198 -
0.7027 2000 0.022 0.0163
0.7379 2100 0.0185 -
0.7730 2200 0.0178 -
0.8082 2300 0.0168 -
0.8433 2400 0.018 -
0.8784 2500 0.0158 0.0127
0.9136 2600 0.0141 -
0.9487 2700 0.015 -
0.9838 2800 0.0131 -
1.0190 2900 0.0117 -
1.0541 3000 0.0106 0.0100
1.0892 3100 0.0082 -
1.1244 3200 0.0088 -
1.1595 3300 0.0084 -
1.1947 3400 0.0087 -
1.2298 3500 0.0093 0.0079
1.2649 3600 0.0106 -
1.3001 3700 0.0097 -
1.3352 3800 0.0074 -
1.3703 3900 0.0072 -
1.4055 4000 0.0094 0.0067
1.4406 4100 0.0062 -
1.4758 4200 0.0072 -
1.5109 4300 0.0081 -
1.5460 4400 0.0075 -
1.5812 4500 0.0071 0.0059
1.6163 4600 0.0049 -
1.6514 4700 0.0064 -
1.6866 4800 0.0072 -
1.7217 4900 0.0075 -
1.7569 5000 0.0062 0.0052
1.7920 5100 0.0061 -
1.8271 5200 0.0059 -
1.8623 5300 0.0062 -
1.8974 5400 0.005 -
1.9325 5500 0.0068 0.0048
1.9677 5600 0.0051 -

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 3.4.1
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.6.0
  • Datasets: 3.5.1
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
12
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sd-dreambooth-library/mks-similarity

Finetuned
(6)
this model