SentenceTransformer based on Qwen/Qwen2.5-0.5B-Instruct

This is a sentence-transformers model finetuned from Qwen/Qwen2.5-0.5B-Instruct. It maps sentences & paragraphs to a 896-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: Qwen/Qwen2.5-0.5B-Instruct
Maximum Sequence Length: 1024 tokens
Output Dimensionality: 896 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 896, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("AlexWortega/qwen1k")
# Run inference
sentences = [
    'When did the July Monarchy end?',
    'July Monarchy\nThe July Monarchy (French: Monarchie de Juillet) was a liberal constitutional monarchy in France under Louis Philippe I, starting with the July Revolution of 1830 and ending with the Revolution of 1848. It marks the end of the Bourbon Restoration (1814–1830). It began with the overthrow of the conservative government of Charles X, the last king of the House of Bourbon.',
    'July Monarchy\nDespite the return of the House of Bourbon to power, France was much changed from the era of the ancien régime. The egalitarianism and liberalism of the revolutionaries remained an important force and the autocracy and hierarchy of the earlier era could not be fully restored. Economic changes, which had been underway long before the revolution, had progressed further during the years of turmoil and were firmly entrenched by 1815. These changes had seen power shift from the noble landowners to the urban merchants. The administrative reforms of Napoleon, such as the Napoleonic Code and efficient bureaucracy, also remained in place. These changes produced a unified central government that was fiscally sound and had much control over all areas of French life, a sharp difference from the complicated mix of feudal and absolutist traditions and institutions of pre-Revolutionary Bourbons.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 896]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Datasets: sts-dev-896 and sts-dev-768
Evaluated with EmbeddingSimilarityEvaluator

Metric	sts-dev-896	sts-dev-768
pearson_cosine	0.4573	0.4455
spearman_cosine	0.4965	0.4897

Training Details

Training Dataset

Unnamed Dataset

Size: 2,859,594 training samples
Columns: query, response, and negative

Approximate statistics based on the first 1000 samples:

	query	response	negative
type	string	string	string
details	min: 4 tokens mean: 8.76 tokens max: 26 tokens	min: 23 tokens mean: 141.88 tokens max: 532 tokens	min: 4 tokens mean: 134.02 tokens max: 472 tokens

Samples:

query	response	negative
`Was there a year 0?`	`Year zero Year zero does not exist in the anno Domini system usually used to number years in the Gregorian calendar and in its predecessor, the Julian calendar. In this system, the year 1 BC is followed by AD 1. However, there is a year zero in astronomical year numbering (where it coincides with the Julian year 1 BC) and in ISO 8601:2004 (where it coincides with the Gregorian year 1 BC) as well as in all Buddhist and Hindu calendars.`	`504 Year 504 (DIV) was a leap year starting on Thursday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Nicomachus without colleague (or, less frequently, year 1257 "Ab urbe condita"). The denomination 504 for this year has been used since the early medieval period, when the Anno Domini calendar era became the prevalent method in Europe for naming years.`
`When is the dialectical method used?`	Dialectic Dialectic or dialectics (Greek: διαλεκτική, dialektikḗ; related to dialogue), also known as the dialectical method, is at base a discourse between two or more people holding different points of view about a subject but wishing to establish the truth through reasoned arguments. Dialectic resembles debate, but the concept excludes subjective elements such as emotional appeal and the modern pejorative sense of rhetoric.[1][2] Dialectic may be contrasted with the didactic method, wherein one side of the conversation teaches the other. Dialectic is alternatively known as minor logic, as opposed to major logic or critique.	Derek Bentley case Another factor in the posthumous defence was that a "confession" recorded by Bentley, which was claimed by the prosecution to be a "verbatim record of dictated monologue", was shown by forensic linguistics methods to have been largely edited by policemen. Linguist Malcolm Coulthard showed that certain patterns, such as the frequency of the word "then" and the grammatical use of "then" after the grammatical subject ("I then" rather than "then I"), were not consistent with Bentley's use of language (his idiolect), as evidenced in court testimony. These patterns fit better the recorded testimony of the policemen involved. This is one of the earliest uses of forensic linguistics on record.
`What do Grasshoppers eat?`	Grasshopper Grasshoppers are plant-eaters, with a few species at times becoming serious pests of cereals, vegetables and pasture, especially when they swarm in their millions as locusts and destroy crops over wide areas. They protect themselves from predators by camouflage; when detected, many species attempt to startle the predator with a brilliantly-coloured wing-flash while jumping and (if adult) launching themselves into the air, usually flying for only a short distance. Other species such as the rainbow grasshopper have warning coloration which deters predators. Grasshoppers are affected by parasites and various diseases, and many predatory creatures feed on both nymphs and adults. The eggs are the subject of attack by parasitoids and predators.	Groundhog Very often the dens of groundhogs provide homes for other animals including skunks, red foxes, and cottontail rabbits. The fox and skunk feed upon field mice, grasshoppers, beetles and other creatures that destroy farm crops. In aiding these animals, the groundhog indirectly helps the farmer. In addition to providing homes for itself and other animals, the groundhog aids in soil improvement by bringing subsoil to the surface. The groundhog is also a valuable game animal and is considered a difficult sport when hunted in a fair manner. In some parts of Appalachia, they are eaten.

Loss: MatryoshkaLoss with these parameters:

{
    "loss": "MultipleNegativesRankingLoss",
    "matryoshka_dims": [
        896,
        768
    ],
    "matryoshka_weights": [
        1,
        1
    ],
    "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 12
per_device_eval_batch_size: 12
gradient_accumulation_steps: 4
num_train_epochs: 1
warmup_ratio: 0.3
bf16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 12
per_device_eval_batch_size: 12
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 4
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.3
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	sts-dev-896_spearman_cosine	sts-dev-768_spearman_cosine
0.0002	10	4.4351	-	-
0.0003	20	4.6508	-	-
0.0005	30	4.7455	-	-
0.0007	40	4.5427	-	-
0.0008	50	4.3982	-	-
0.0010	60	4.3755	-	-
0.0012	70	4.4105	-	-
0.0013	80	5.2227	-	-
0.0015	90	5.8062	-	-
0.0017	100	5.7645	-	-
0.0018	110	5.9261	-	-
0.0020	120	5.8301	-	-
0.0022	130	5.7602	-	-
0.0023	140	5.9392	-	-
0.0025	150	5.7523	-	-
0.0027	160	5.8585	-	-
0.0029	170	5.7916	-	-
0.0030	180	5.8157	-	-
0.0032	190	5.7102	-	-
0.0034	200	5.5844	-	-
0.0035	210	5.5463	-	-
0.0037	220	5.5823	-	-
0.0039	230	5.5514	-	-
0.0040	240	5.5646	-	-
0.0042	250	5.5783	-	-
0.0044	260	5.5344	-	-
0.0045	270	5.523	-	-
0.0047	280	5.4969	-	-
0.0049	290	5.5407	-	-
0.0050	300	5.6171	-	-
0.0052	310	5.5581	-	-
0.0054	320	5.8903	-	-
0.0055	330	5.8675	-	-
0.0057	340	5.745	-	-
0.0059	350	5.6041	-	-
0.0060	360	5.5476	-	-
0.0062	370	5.3964	-	-
0.0064	380	5.3564	-	-
0.0065	390	5.3054	-	-
0.0067	400	5.2779	-	-
0.0069	410	5.206	-	-
0.0070	420	5.2168	-	-
0.0072	430	5.1645	-	-
0.0074	440	5.1797	-	-
0.0076	450	5.2526	-	-
0.0077	460	5.1768	-	-
0.0079	470	5.3519	-	-
0.0081	480	5.2982	-	-
0.0082	490	5.3229	-	-
0.0084	500	5.3758	-	-
0.0086	510	5.2478	-	-
0.0087	520	5.1799	-	-
0.0089	530	5.1088	-	-
0.0091	540	4.977	-	-
0.0092	550	4.9108	-	-
0.0094	560	4.811	-	-
0.0096	570	4.7203	-	-
0.0097	580	4.6499	-	-
0.0099	590	4.4548	-	-
0.0101	600	4.2891	-	-
0.0102	610	4.1881	-	-
0.0104	620	4.6	-	-
0.0106	630	4.5365	-	-
0.0107	640	4.3086	-	-
0.0109	650	4.0452	-	-
0.0111	660	3.9041	-	-
0.0112	670	4.3938	-	-
0.0114	680	4.3198	-	-
0.0116	690	4.1294	-	-
0.0117	700	4.077	-	-
0.0119	710	3.9174	-	-
0.0121	720	4.1629	-	-
0.0123	730	3.9611	-	-
0.0124	740	3.7768	-	-
0.0126	750	3.5842	-	-
0.0128	760	3.1196	-	-
0.0129	770	3.6288	-	-
0.0131	780	3.273	-	-
0.0133	790	2.7889	-	-
0.0134	800	2.5096	-	-
0.0136	810	1.8878	-	-
0.0138	820	2.3423	-	-
0.0139	830	1.7687	-	-
0.0141	840	2.0781	-	-
0.0143	850	2.4598	-	-
0.0144	860	1.7667	-	-
0.0146	870	2.6247	-	-
0.0148	880	1.916	-	-
0.0149	890	2.0817	-	-
0.0151	900	2.3679	-	-
0.0153	910	1.418	-	-
0.0154	920	2.7353	-	-
0.0156	930	1.992	-	-
0.0158	940	1.4564	-	-
0.0159	950	1.4154	-	-
0.0161	960	0.9499	-	-
0.0163	970	1.6304	-	-
0.0164	980	0.9264	-	-
0.0166	990	1.3278	-	-
0.0168	1000	1.686	0.4965	0.4897

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.3.0
Transformers: 4.46.2
PyTorch: 2.1.0+cu118
Accelerate: 1.1.1
Datasets: 3.1.0
Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 3

Safetensors

Model size

0.5B params

Tensor type

F32

Model tree for AlexWortega/qwen1k

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Finetuned

(596)

this model

Papers for AlexWortega/qwen1k

Evaluation results

Pearson Cosine on sts dev 896
self-reported

0.457
Spearman Cosine on sts dev 896
self-reported

0.496
Pearson Cosine on sts dev 768
self-reported

0.446
Spearman Cosine on sts dev 768
self-reported

0.490