Matryoshka Representation Learning
Paper
•
2205.13147
•
Published
•
25
This is a sentence-transformers model finetuned from Qwen/Qwen2.5-0.5B-Instruct. It maps sentences & paragraphs to a 896-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: Qwen2Model
(1): Pooling({'word_embedding_dimension': 896, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("AlexWortega/qwen1k")
# Run inference
sentences = [
'When did the July Monarchy end?',
'July Monarchy\nThe July Monarchy (French: Monarchie de Juillet) was a liberal constitutional monarchy in France under Louis Philippe I, starting with the July Revolution of 1830 and ending with the Revolution of 1848. It marks the end of the Bourbon Restoration (1814–1830). It began with the overthrow of the conservative government of Charles X, the last king of the House of Bourbon.',
'July Monarchy\nDespite the return of the House of Bourbon to power, France was much changed from the era of the ancien régime. The egalitarianism and liberalism of the revolutionaries remained an important force and the autocracy and hierarchy of the earlier era could not be fully restored. Economic changes, which had been underway long before the revolution, had progressed further during the years of turmoil and were firmly entrenched by 1815. These changes had seen power shift from the noble landowners to the urban merchants. The administrative reforms of Napoleon, such as the Napoleonic Code and efficient bureaucracy, also remained in place. These changes produced a unified central government that was fiscally sound and had much control over all areas of French life, a sharp difference from the complicated mix of feudal and absolutist traditions and institutions of pre-Revolutionary Bourbons.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 896]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sts-dev-896 and sts-dev-768EmbeddingSimilarityEvaluator| Metric | sts-dev-896 | sts-dev-768 |
|---|---|---|
| pearson_cosine | 0.4573 | 0.4455 |
| spearman_cosine | 0.4965 | 0.4897 |
query, response, and negative| query | response | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| query | response | negative |
|---|---|---|
Was there a year 0? |
Year zero |
504 |
When is the dialectical method used? |
Dialectic |
Derek Bentley case |
What do Grasshoppers eat? |
Grasshopper |
Groundhog |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
896,
768
],
"matryoshka_weights": [
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: stepsper_device_train_batch_size: 12per_device_eval_batch_size: 12gradient_accumulation_steps: 4num_train_epochs: 1warmup_ratio: 0.3bf16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 12per_device_eval_batch_size: 12per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 4eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.3warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | sts-dev-896_spearman_cosine | sts-dev-768_spearman_cosine |
|---|---|---|---|---|
| 0.0002 | 10 | 4.4351 | - | - |
| 0.0003 | 20 | 4.6508 | - | - |
| 0.0005 | 30 | 4.7455 | - | - |
| 0.0007 | 40 | 4.5427 | - | - |
| 0.0008 | 50 | 4.3982 | - | - |
| 0.0010 | 60 | 4.3755 | - | - |
| 0.0012 | 70 | 4.4105 | - | - |
| 0.0013 | 80 | 5.2227 | - | - |
| 0.0015 | 90 | 5.8062 | - | - |
| 0.0017 | 100 | 5.7645 | - | - |
| 0.0018 | 110 | 5.9261 | - | - |
| 0.0020 | 120 | 5.8301 | - | - |
| 0.0022 | 130 | 5.7602 | - | - |
| 0.0023 | 140 | 5.9392 | - | - |
| 0.0025 | 150 | 5.7523 | - | - |
| 0.0027 | 160 | 5.8585 | - | - |
| 0.0029 | 170 | 5.7916 | - | - |
| 0.0030 | 180 | 5.8157 | - | - |
| 0.0032 | 190 | 5.7102 | - | - |
| 0.0034 | 200 | 5.5844 | - | - |
| 0.0035 | 210 | 5.5463 | - | - |
| 0.0037 | 220 | 5.5823 | - | - |
| 0.0039 | 230 | 5.5514 | - | - |
| 0.0040 | 240 | 5.5646 | - | - |
| 0.0042 | 250 | 5.5783 | - | - |
| 0.0044 | 260 | 5.5344 | - | - |
| 0.0045 | 270 | 5.523 | - | - |
| 0.0047 | 280 | 5.4969 | - | - |
| 0.0049 | 290 | 5.5407 | - | - |
| 0.0050 | 300 | 5.6171 | - | - |
| 0.0052 | 310 | 5.5581 | - | - |
| 0.0054 | 320 | 5.8903 | - | - |
| 0.0055 | 330 | 5.8675 | - | - |
| 0.0057 | 340 | 5.745 | - | - |
| 0.0059 | 350 | 5.6041 | - | - |
| 0.0060 | 360 | 5.5476 | - | - |
| 0.0062 | 370 | 5.3964 | - | - |
| 0.0064 | 380 | 5.3564 | - | - |
| 0.0065 | 390 | 5.3054 | - | - |
| 0.0067 | 400 | 5.2779 | - | - |
| 0.0069 | 410 | 5.206 | - | - |
| 0.0070 | 420 | 5.2168 | - | - |
| 0.0072 | 430 | 5.1645 | - | - |
| 0.0074 | 440 | 5.1797 | - | - |
| 0.0076 | 450 | 5.2526 | - | - |
| 0.0077 | 460 | 5.1768 | - | - |
| 0.0079 | 470 | 5.3519 | - | - |
| 0.0081 | 480 | 5.2982 | - | - |
| 0.0082 | 490 | 5.3229 | - | - |
| 0.0084 | 500 | 5.3758 | - | - |
| 0.0086 | 510 | 5.2478 | - | - |
| 0.0087 | 520 | 5.1799 | - | - |
| 0.0089 | 530 | 5.1088 | - | - |
| 0.0091 | 540 | 4.977 | - | - |
| 0.0092 | 550 | 4.9108 | - | - |
| 0.0094 | 560 | 4.811 | - | - |
| 0.0096 | 570 | 4.7203 | - | - |
| 0.0097 | 580 | 4.6499 | - | - |
| 0.0099 | 590 | 4.4548 | - | - |
| 0.0101 | 600 | 4.2891 | - | - |
| 0.0102 | 610 | 4.1881 | - | - |
| 0.0104 | 620 | 4.6 | - | - |
| 0.0106 | 630 | 4.5365 | - | - |
| 0.0107 | 640 | 4.3086 | - | - |
| 0.0109 | 650 | 4.0452 | - | - |
| 0.0111 | 660 | 3.9041 | - | - |
| 0.0112 | 670 | 4.3938 | - | - |
| 0.0114 | 680 | 4.3198 | - | - |
| 0.0116 | 690 | 4.1294 | - | - |
| 0.0117 | 700 | 4.077 | - | - |
| 0.0119 | 710 | 3.9174 | - | - |
| 0.0121 | 720 | 4.1629 | - | - |
| 0.0123 | 730 | 3.9611 | - | - |
| 0.0124 | 740 | 3.7768 | - | - |
| 0.0126 | 750 | 3.5842 | - | - |
| 0.0128 | 760 | 3.1196 | - | - |
| 0.0129 | 770 | 3.6288 | - | - |
| 0.0131 | 780 | 3.273 | - | - |
| 0.0133 | 790 | 2.7889 | - | - |
| 0.0134 | 800 | 2.5096 | - | - |
| 0.0136 | 810 | 1.8878 | - | - |
| 0.0138 | 820 | 2.3423 | - | - |
| 0.0139 | 830 | 1.7687 | - | - |
| 0.0141 | 840 | 2.0781 | - | - |
| 0.0143 | 850 | 2.4598 | - | - |
| 0.0144 | 860 | 1.7667 | - | - |
| 0.0146 | 870 | 2.6247 | - | - |
| 0.0148 | 880 | 1.916 | - | - |
| 0.0149 | 890 | 2.0817 | - | - |
| 0.0151 | 900 | 2.3679 | - | - |
| 0.0153 | 910 | 1.418 | - | - |
| 0.0154 | 920 | 2.7353 | - | - |
| 0.0156 | 930 | 1.992 | - | - |
| 0.0158 | 940 | 1.4564 | - | - |
| 0.0159 | 950 | 1.4154 | - | - |
| 0.0161 | 960 | 0.9499 | - | - |
| 0.0163 | 970 | 1.6304 | - | - |
| 0.0164 | 980 | 0.9264 | - | - |
| 0.0166 | 990 | 1.3278 | - | - |
| 0.0168 | 1000 | 1.686 | 0.4965 | 0.4897 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}